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Preface 



The idea of evolving machines, whose origins can be traced to the cybernetics 
movement of the 1940s and 1950s, has recently resurged in the form of the nascent 
field of bio-inspired systems and evolvable hardware. The inaugural workshop. 
Towards Evolvable Hardware, took place in Lausanne in October 1995, followed 
by the First International Conference on Evolvable Systems: From Biology to 
Hardware (ICES), held in Tsukuba, Japan in October 1996. The second ICES 
conference was held in Lausanne in September 1998, with the third and fourth 
being held in Edinburgh, April 2000 and Tokyo, October 2001 respectively. This 
has become the leading conference in the field of evolvable systems and the 2003 
conference promised to be at least as good as, if not better than, the four that 
proceeded it. 

The fifth international conference was built on the success of its predeces- 
sors, aiming at presenting the latest developments in the field. In addition, it 
brought together researchers who use biologically inspired concepts to imple- 
ment real systems in artificial intelligence, artificial life, robotics, VLSI design 
and related domains. We would say that this fifth conference followed on from 
the previous four in that it consisted of a number of high-quality interesting 
thought-provoking papers. 

We received 58 papers in total. All of these papers were reviewed by three 
independent reviewers. As such, we feel that we compiled an excellent pack- 
age for ICES 2003. The conference included 3 keynote talks titled: “Nano- and 
biotechnology,” “From wheels to wings with evolutionary spiking neurons,” and 
“Machine design of quantum computers: A new frontier.” We had 41 technical 
presentations, a panel debate, and 3 tutorials in the areas of: evolutionary algo- 
rithms, evolvable hardware and reconfigurable devices, and nanotechnology. In 
addition to the technical program, there was a strong and varied social program 
both during and after the conference. 

We would like to thank the reviewers for their time and effort in reviewing 
all of the submitted papers. We would also like to thank the other members of 
the organizing committee, including the local chair Keith Downing and public- 
ity chair Gunnar Tufte. We are most grateful to Erode Eskelund, Diego Federici, 
and Karstein A. Kristiansen for their great help in developing the web-based 
paper submission and registration tool. We wish to thank the following for their 
contribution to the success of this conference: The Research Council of Nor- 
way; European Community 1ST programme; Norwegian University of Science 
and Technology; University of Oslo; European Office of Aerospace Research and 
Development, Air Force Office of Scientific Research, United States Air Force 
Research Laboratory; Telenor ASA; Atmel; and Siemens. Finally, we would like 
to thank all of those authors who put so much effort into their research and 
decided to publish their work at our conference. 
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Preface 



What topics might we consider important for the next few years? It is clear 
that the field is still developing, evolving, and that many of the “hot” top- 
ics at this conference will be seen again next time. These will include: evo- 
lutionary hardware design; co-evolution of hybrid systems; evolving hardware 
systems; intrinsic, and on-line evolution; hardware/software co-evolution; self- 
repairing hardware; self-reconfiguring hardware; embryonic hardware; morpho- 
genesis; novel devices; adaptive computing; and of course we are always looking 
for, and interested in, real-world applications of evolvable hardware. Will we see 
breakthroughs relating to nanotechnology, new reconfigurable FPGAs and new 
models of reliability for long space missions? Only time will tell - as with all 
evolution. 

We hope you enjoy reading these proceedings as much as we enjoyed putting 
them together. 



January 2003 Andy M. Tyrrell 

Pauline C. Haddow 
Jim Torresen 
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Abstract. Many observers have marveled at the beauty of the syn- 
chronous flashing of fireflies that has an almost hypnotic effect. In this pa- 
per we consider the issue of evolving two-dimensional cellular automata 
as well as random boolean networks to solve the firefly synchronization 
task. The task was successfully solved by means of cellular program- 
ming based co-evolution performing computations in a completely local 
manner, each cell having access only to its immediate neighbor’s states. 
An FPGA-based Evolware implementation on the Bio Wall’s cellular tis- 
sue and different other simulations show that the approach is very effi- 
cient and easily implementable in hardware. 



1 Introduction 

The mutual synchronization of oscillators - a both surprising and interesting 
phenomenon - in living things and nature is ubiquitous (see for example M)- 
dancing (natural response to music), pacemaker cells in the heart, the circa- 
dian clock of multicellular organisms (consists of multiple autonomous single- 
cell oscillators) |^, the nervous system that controls rhythmic behavior such 
as breathing, running and chewing, synchronous flashing of fireflies, choruses of 
cicadas or crickets m , etc. 

An oscillator can be defined in a loosely manner as a system that executes 
a periodic behavior. The pendulum is probably the best example of a periodic 
behavior in space and time. Electronic circuits offer many different types of 
oscillators that produce an output signal of a specific frequency. Often, a very 
stable mechanical oscillator, such as a specially prepared quartz crystal, may be 
coupled to an electronic oscillator to enhance its frequency stability. 

Once the behavior of a single oscillator is understood, the more complex 
behavior of coupled oscillators can be investigated, although the equations gov- 
erning their behavior usually become quickly intractable. Strogarz and Stew- 
art m write that the most familiar mode of organization for coupled oscillators 
is synchrony. “One of the most spectacular examples of this kind of coupling 
can be seen along the tidal rivers of Malaysia, Thailand and New Guinea, where 
thousands of male fireflies gather together in trees at night and flash on and off 
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in unison in an attempt to attract the females that cruise overhead. When the 
males arrive at dusk, their flickerings are uncoordinated. As the night deepens, 
pockets of synchrony begin to emerge and grow. Eventually whole trees pulsate 
in a silent, hypnotic concert that continues for hours lEHl- Synchronization in 
living things usually emerges spontaneously and through cooperative behavior: 
if a few individuals happen to synchronize, they tend to exert stronger influ- 
ence on their common neighbors. See Buck and Buck [0 for a review of various 
theories and information about synchronous fireflies. 

Another well known example is the synchronization of menstrual cycles among 
women friends or roommates M- There exist various ideas about the mecha- 
nism of synchronization, but an experiment in 1980 has shown that it might 
have something to do with sweat m- 

In this paper, we concentrate on the two-dimensional firefly synchronization 
task for non-uniform CAs (each automaton may have a different rule) and ran- 
dom boolean networks. The two-dimensional version of the task is in principle 
similar to the one-dimensional version, although the speed of synchronization is 
in general much faster. 

The organization of the paper is as follows: Section ^presents the synchro- 
nization task for synchronous cellular automata. Asynchronous automata are 
presented in Section 0 Section I i brieflv presents some measures that allow a 
better comparison of the experiments. A hardware-based implementation of the 
firefly synchronization task on the BioWall’s cellular tissue is presented in Sec- 
tion n.'Zi The CA implemented on the Bio Wall uses co-evolution in a completely 
local manner, each cell having access only to its immediate neighbors’ states, 
to And a solution to the task. Section H. 8 1 presents an alternative to the syn- 
chronous application of the genetic algorithms to each cell: the algorithm works 
asynchronously. Finally, Section ^^successfully applies the co-evolutionary ap- 
proach to random boolean networks (RBN). Section 0 concludes the paper. 

2 The Synchronization Task 

for Synchronous Cellular Automata 

The one-dimensional synchronization task for synchronous CA was introduced 
by Das et al. m and studied among others by Hordijk [EH and Sipper En- 
in this task the two-state one-dimensional CA, given any initial configuration, 
must reach a final configuration, within M time steps, that oscillates between 
all Os and all Is on successive time steps. The whole automaton is then globally 
synchronized. Synchronization comprises a non-trivial computation for a small- 
radius CA: all cells must coordinate with all the other cells while having only 
a very local view of its neighbors. The existence of the global clock, though 
not without consequences as we will see later, should not lure us into thinking 
that the task is straightforward. Obviously, the fact that the whole computation 
should only occur within a two-state machine prevents any counting. 

In this paper, we concentrate on the the two-dimensional version of the task 
for non-uniform CAs and random boolean networks. The two-dimensional ver- 



On Fireflies, Cellular Systems, and Evolware 



3 



i' 






% 






|E^ 










f 





14 15 16 17 IS 19 JO 

21 22 n 21 25 26 27 

•M JO 31 X2 X\ ^ 

35 36 37 38 3!> 40 -n 

Fig. 1. The two-dimensional synchronization task for synchronous CA. Source: [ U/f . 




sion of the task, see Figure Q is identical to the one-dimensional version except 
that the necessary number of time steps granted to synchronize is not any more 
in the order of N, the number of cells in the automaton, but in the order of 
n + m where n, m are the size of each side of the CA. Thereby, the speed of syn- 
chronization is much faster. Non-uniform CAs are CAs where each automaton 
may have a different rule. 

Obviously, in the non-uniform case there is an immediate solution consisting 
of a unique ‘master’ rule, alternating between ‘0’ and ‘1’, whatever the neighbor- 
hood, and all other cells being its ‘slave’ and alternating according to its right or 
up neighbor state only. However, Sipper m used non-uniform CAs to find per- 
fect synchronizing CAs only by means of evolution. It appeared that this “basic” 
solution was never found by evolution and, in fact, the “master” or “blind” rule 
10101010 (rule 170) in one dimension, was never part of the evolved solutions. 
This is simply due to the fact that this rule has to be unique for the solution to 
be perfect, which is contradictory to the natural tendency of the evolutionary 
algorithm used, as we demonstrated in [0. In this paper, we only use evolved 
solutions to this task. 



3 Asynchronous Cellular Automata 

In the preceding section the CA model is synchronous as is traditionally the case. 
But this feature is far from being realistic from a biological point of view as well 
as from a computational point of view. As for the former, it is quite evident 
that there is no accurate global synchronization in nature. As for the latter, it 
turns out to be impossible to maintain a large network of automata globally 
synchronous in practice. Besides, it is interesting to try to delineate what part 
of the computation relies on the synchrony constraint. 




4 



Christof Teuscher and Mathieu S. Capcarrere 



Ingerson and Buwel m compared the different dynamical behavior of syn- 
chronous and asynchronous CAs; they argued that some of the apparent self- 
organization of CAs is an artifact of the synchronization of the clocks. Wol- 
fram m noted that asynchronous updating makes it more difficult for infor- 
mation to propagate through the CA and that, furthermore, such CAs may be 
harder to analyze. Asynchronous CAs have also been discussed from a problem- 
solving and/or Artificial Life standpoint in references IH^i lTI E^iroilT^Hbl . All 
these works devoted to asynchronous cellular automata only concentrated on 
the study of the effects but not on correcting asynchrony or dealing with it. 
From a theoretical computer science perspective, Zielonka introduced 

the concept of asynchronous cellular automata. Though the question attracted 
quite some interest JEHIESIE]; the essential idea behind them was to prove that 
they were “equivalent” to synchronous CA in the sense that, for instance, they 
recognize the same trace languages or could decide emptiness. From these two 
fields, we thus knew that asynchronous CA are potentially as powerful as syn- 
chronous CA, computationally speaking and, nevertheless, that most of the 
effects observed in the synchronous case are artifacts of the global clock. 

In [0^ we proposed asynchronous CAs exhibiting the same behavior as syn- 
chronous CAs through both design and evolution. The main idea behind these 
models was that time is part of the information contained in a CA configura- 
tion. Hence suppressing the global clock constraint is equivalent to suppressing 
information. Thereby, and quite naturally, if we are to maintain the capability of 
the CA, then we must compensate for that loss of information by adding extra 
states. As shown then, it is possible to design a CA with 3 * states working 
asynchronously which simulates perfectly a given q-staie CA. Tolerating some 
loss of information, we have shown that it was possible to evolve self-correcting 
4-state CA to do the synchronization task under low asynchrony. 

As Gacs cniEi] reminds us, asynchrony may be considered as a special case 
of fault-tolerance. However, even though this consideration is nice in its gen- 
eralization (i.e., a fault-tolerant CA is also asynchronous), it eschews a lot of 
potential optimization. 



4 Implementations and Experiments 

4.1 Genotypic Measures for Non-uniform CAs 

To better compare the different simulations and implementations, we define a 
number of metrics (some of them adapted from 0). 

Let’s first define N to be the total number of cells. We have N = n*m where 
m, n are the length of the sides of the two-dimensional CA considered. We can 
now define the frequency of transitions, v, as the number of borders between 
homogeneous blocks of cells having the same genotype, divided by the number 
of distinct couples of adjacent cells. Adjency is here taken along the neighborhood 

^ In the traditional sense, not in the sense of visual computation. 
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used, that is the von Neumann neighborhood. Thus v is the probability that two 
adjacent cells have a different rule (Equation 0. 



can also define a more global measure of the diversity of the different genotypes 
encountered, an entropy, H (Equation 0. 



where F is the set of all possible genotypes (rules), and g(r) is the global pro- 
portion of the genotype (rule) r in the cellular automata. Obviously, FI takes on 
values in the interval [0,logn] and attains its maximum, H(x) = logn, when x 
comprises n different genotypes. H is usually normalized to take values in the 
interval [0,1]. 

4.2 Firefly Hardware 

Fireflies synchronize according to the “pulse coupled” system [EU, i.e., they in- 
teract only when one sees the sudden flash of another. The firefly then shifts 
its rhythm accordingly. “Pulse coupling is difficult to handle mathematically be- 
cause it introduces discontinuous behavior into the otherwise continuous model 
and so stymies most of the standard mathematical techniques” m- Mathemati- 
cians have turned to the theory of symmetry breaking to tackle the complex 
problems that arise when identical oscillators are coupled. Computer scientists, 
on the other hand, often use cellular automata, irregular networks and grid ar- 
rays or more complex agent-based models to analyze the behavior of interacting 
elements. 

In 1997, the Logic Systems Laboratory presented an evolving hardware sys- 
tem called Firefly in] that successfully solved the synchronization task in one 
dimension using cellular programming based co-evolution. The novelty of Firefly 
was that it operates with no reference to an external device (such as a com- 
puter that carries out genetic operators) thereby exhibiting online autonomous 
evolution. 

Recently, the synchronization task in two dimensions has successfully been 
evolved on the BioWall’s 3200 FPGAs. The BioWall [flOJ is a giant reconfigurable 
computing tissue developed to implement machines according to the principles 
of the Embryonics (embryonic electronics) project |E!]. The BioWall’s size and 
features are designed for public exhibition, but at the same time it represents 
an invaluable research tool, particularly since its complete programmability and 
cellular structure are extremely well adapted to the implementation of many 
different kinds of bio-inspired systems. The implementation on the BioWall con- 
sists of a two-state, non-uniform CA, in which each cell (i.e., each FPGA of the 
BioWall) may contain a different rule. The cells rule tables are encoded as a 
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Fig. 2. Von Neumann neighborhood of our two-dimensional CA and cell rule encoding. 



bit-string, known as the genome, that has a length of 2® = 32 bits for our 2D 
CA since the binary CA has a neighborhood of 5 (see also Figure 0. 

Rather than to employ a population of evolving CAs, our algorithm (see 
Algorithm 0 evolves a single, non-uniform CA of the size of the entire BioWall 
(one cell of the CA in each unit of the BioWall, that is, 3200 cells), whose rules 
are initialized at random. Initial configurations are then randomly generated 
and for each configuration the CA is run for M time steps (M has been set 
to 2(n + m) = 2(160 -I- 20) = 360, where n,m are the size of each side of the 
CA). Each cell’s fitness is accumulated over C {C has been set to 300) initial 
configurations. The (local) fitness score for the synchronization task is assigned 
to each cell by considering the last four time steps (M -|- 1 to M -|- 4): if the 
sequence of states over these steps is precisely 0 — 1 — 0 — 1, the cell’s fitness 
score is 1, otherwise this score is 0. 

After every C configurations the rules are evolved through crossover and 
mutation. This evolutionary process is performed in a completely local manner, 
that is, genetic operators are applied only between directly connected cells. Un- 
like standard genetic algorithms, where a population of independent problem 
solutions globally evolves, our approach involves a grid of rules that co-evolves 
locally. The CA implemented on the BioWall performs computations in a com- 
pletely local manner, each cell having access to its immediate neighbors’ states 
only. In addition, the evolutionary process is also completely local, since the 
application of the genetic operators as well as the fitness assignment occurs 
locally. Using the above-described cellular programming approach (see also Al- 
gorithm Q on the BioWall, we have shown that a non-uniform CA of radius 1 
can be evolved to successfully solve the synchronization task. In addition, once 
a set of successful rules has been found, our machine allows the state of each CA 
cell to be changed by pressing on its membrane. The user can then observe how 
the machine synchronizes the 3200 cells. 

In contrast to the original Firefly machine [Eg, which used a slightly simpli- 
fied algorithm to facilitate the implementation, the BioWall implements exactly 
the algorithm as described above. The BioWall implementation, however, con- 
tains no global synchronization detector, therefore, it is not 100% guaranteed 
that the CA always synchronizes perfectly (i.e., for any initial configuration). 
Each cell contains a pseudo-random generator, realized by means of a linear 
feedback shift register, to initialize randomly the cell’s state and its rule table. 
The register receives an initial random seed from an external PC that also con- 
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Algorithm 1 Cellular programming pseudo-code for the synchronization of a 

two-dimensional cellular automaton 

for each cell i in CA [in parallel] do 
Initialize rule table of cell i randomly 
fi — 0 [fitness value] 
end for 

c = 0 [initial configurations counter] 
while not done do 

Generate a random initial configuration 

Run CA on initial configuration for M = 2(m -|- n) time steps 
for each cell i in CA [in parallel] do 

if cell i is in the correct final state then 
fi = fi + i 

end if 
end for 

c = c -I- 1 

if c mod C = 0 [evolve every C configurations] then 
for each cell i [in parallel] do 

Compute nfi{c) [number of fitter neighbors[ 
if nfi{c) = 0 then 

Rule i is left unchanged 
else if nfi = 1 then 

Replace rule i with the fitter neighboring rule, followed by mutation 
else if nfi > 1 then 

Replace rule i with the crossover of the two fittest neighboring rules (ran- 
domly chosen, if equal fitness), followed by mutation 

end if 
fi = 0 
end for 
end if 
end while 



figures the FPGAs. The actual implementation easily fits into a Xilinx Spartan 
XCSIOXL FPGA. 

As an example, Figure 0 shows the evolution of the fitness as well as the 
number of rules in function of the number of evolutionary phases for a small 
4x4 cell GA (toroidal). A solution that perfectly synchronizes the 16 cells 
from any initial configuration (there are 2^® = 65536) has been found with the 
following five rules: 



99ea2e68 

99ea2e68 

99ea2e68 

99e82e68 



81f2a5fa 

81f2a5fa 

81f2a5fa 

81f2a5fa 



f e7aeb98 
f e7aeb98 
f e7aeb98 
f e7aeb98 



bf82fa06 

bf82fa06 

bf82fa06 

bf82fa06 



As stated in Section 0 there is a trivial solution with two rules (one cell 
being the ’master’ and all the other cells being ’slaves’), but it is very unlikely 
that this configuration is found by evolution since this rule has to be unique for 
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Fig. 3. The CA’s fitness (on the left) and the number of different rules (on the right) 
of a small 4x4 cell CA (means over several runs). The CA successfully synchronizes 
for any initial configuration using five different rules. 



Algorithm 2 Cellular programming pseudo-code for the synchronization of a 
two-dimensional cellular automaton using asynchronous co-evolution 
for each cell i in CA [in parallel] do 
Initialize rule table of cell i randomly 

end for 

while not done do 

Generate a random initial configuration 
for D steps do 

Run CA on initial configuration for T time steps 
Choose a cell i at random 
Compute fitness fi and fitness of adjacent cells 
Apply GA to cell i [crossover and mutation] 

end for 
end while 



the solution to be perfect. The above example has a, v of 0.563, the entropy H 
is 0.562. 



4.3 Asynchronous Co-evolution 

In the previous section we described the co-evolution of a synchronous CA where 
the genetic algorithm in each cell is synchronized by means of a global signal: 
all cells evaluated the fitness at the same time, replaced the rules at the same 
time, etc. It would obviously be interesting to use some kind of ’asynchronous’ co- 
evolution where cells still operate synchronously, but where the genetic algorithm 
operates asynchronously and randomly in time. This would allow to remove the 
GA’s global synchronization signal. 

We used the method as shown in Algorithm Q 

Our simulations (with T = lOA, D = N) have revealed that the CA is 
still able to synchronize perfectly, although more different rules are required and 
synchronization seems slower in general. Evolution found eight different rules for 
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Fig. 4. A random graph with 4 incoming links per node/cell. The eight rules that allow 
to perfectly synchronize any initial configuration are written on the nodes. 



the example of the previous section. Naturally, the probability that two adjacent 
cells have a different rule was higher: v = 0.969. The entropy H was 0.938. 



4.4 The Synchronization Task for Random Boolean Networks 

Solving the synchronization task on a completely regular grid abstracts from 
the original firefly synchronization task in nature and it would thus be more 
interesting and biologically plausible to use a random graph. To avoid problems 
during crossover between rules having different lengths, the number of incoming 
links per node/cell was held constant. Figure ^ shows a random graph with 
four incoming links per node. Using Stuart Kauffman’s terminology (see for 
example the graph is a fV = 16, iC = 4 random boolean network. In this 

context, K specifies the exact number of inputs per node. 

Using Algorithm Q our co-evolutionary approach was able to find a perfect 
solution using eight different rules {ly = 0.859, H = 0.652) for the K = 4 
random boolean network (RBN) of Figure El Figure Elshows the evolution of the 
number of rules for a AT = 4 (on the left) and for a K = 2 RBN (on the right). 
It is astonishing that one rule is in general sufficient for the K = 2 network 
{v = 0, H — 0). Further investigations would be needed, but it almost certainly 
has something to do with Kauffman’s findings that the most highly organized 
behavior appeared to occur in networks where each node receives inputs from 
two other nodes {K = 2). It turned out that Kauffman’s networks exhibit three 
major regimes of behavior: ordered (solid), complex (liquid), and chaotic (gas). 
The most complex and interesting dynamics correspond to the liquid interface, 
the boundary between order and chaos. In the ordered regime, little computation 
can occur. In the chaotic phase, dynamics are too disordered to be useful. 
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Fig. 5. The number of different rules (on the left) oi a. K = A random boolean network 
(RBN) and of a if = 2 RBN (on the right). The if = 2 network synchronizes with one 
rule only, the fe = 4 network with eight rules. 



5 Conclusion and Future Work 

In this paper we considered the issue of evolving two-dimensional cellular au- 
tomata as well as random boolean networks to solve the firefly synchronization 
task. The task has been solved using a co-evolutionary genetic algorithm that 
performs computations in a completely local manner, each cell having access to 
its immediate neighbors’ states only. 

The FPGA-based implementation on the BioWall’s cellular tissue exhibits 
complete online co-evolution using local interactions only between neighboring 
cells. Besides the global clock signals, no other global signal is needed. 

Our experiments have shown that the cellular programming based approach 
is very efficient, easily finds a solution for classical two-dimensional cellular au- 
tomata as well as for random boolean networks, and is easily implementable on 
cellular hardware (e.g. FPGAs or arrays of FPGAs) since there are - except the 
global clock signal - no other global signals needed. 

Future work will be focused on the firefly synchronization of completely asyn- 
chronous cellular automata and of random boolean networks having a different 
number of inputs for each cell. This would allow to remove all global signals and 
to use local clocks only, which would be an important additional advantage for 
Evolware implementations. Another focus point will be the synchronization of 
imperfect (lossy) 2D GAs and networks similar to the recent work of Ghallet and 
Johnson 0 (see also 0) who have shown how imprecise clocks can keep good 
time by working together. 
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Abstract. Evolvable hardware represents an emerging field in which evolution- 
ary design has recently produced promising results. However, the choice of ef- 
fective circuit representation is inexplicit. In this paper, we compare different 
circuit representations for evolutionary analog circuit design. The results indi- 
cate that the design quality is better for the element-list circuit representation. 



1 Introduction 

Circuit design is a difficult problem because of buge design space dimensions [1]. 
Different circuits are generated by a variation of parameters and topologies. Mainly 
they are useless, while the small part of them corresponds to a design goal. An effec- 
tive search in such large complex spaces is impossible without smart tools [5]. One of 
the approaches to circuit design is a restriction of design space by means of expert 
knowledge. However, an expert circuit designer uses the part of possible circuit solu- 
tions that have been found by chance or by a search within tiny regions. 

Recently, automated analog circuit design techniques have been proposed. Their 
advantage is a search within the whole space, but it requires large computational ef- 
forts. There are several approaches to a reduction of design space by means of mac- 
romodelling, symbolic analysis etc. [3]. However, the application of evolutionary 
algorithms for circuit design seems the most attractive approach, which allows us to 
obtain novel circuit solutions. Many positive results have been published [1-15]. 
However, the choice of the best structure of evolutionary design is unclear. Circuit 
representation, fitness evaluation technique, and search method have to be chosen for 
effective evolutionary circuit design carefully. However, their choice is inexplicit in 
practice. 

In this paper we aim to compare different circuit representations for static fitness 
schedule and distribution algorithms. Univariate and bivariate distribution algorithms 
use estimations of the probability. Hence, a reasonable choice of circuit representation 
has a profound result for effective circuit design. We compare the effectiveness of 
branch-list circuit representation and element-list circuit representation for univariate 
algorithms. We use lowpass filter benchmark. 

The remainder of the paper is organized as follows. Section 2 summaries briefly 
recent results obtained for evolutionary circuit design. Section 3 elaborates on the 
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circuit representations. The experimental setup and results are presented in Sections 4 
and 5. We discuss the results in Section 6. 



2 Approaches to Evolutionary Circuit Design 

Evolutionary algorithms are used both for digital circuit design and analog circuit 
design [1], Circuit representation, fitness evaluation technique, and search algorithms 
are crucial ingredients of evolutionary circuit design. In this paper, we restrict our 
research to analog circuit design, although there are the same problems for digital 
circuit design as well. 

A correct circuit representation is the base for effective design. There are many ap- 
proaches to circuit description. Tree representation is used by the growing technique 
[8], which allows us to obtain a sequence of circuits by an addition or a deletion of 
circuit elements. The deficiencies of the approach are circuits of illegal topologies that 
are generated at each step. Analysis of useless solutions requires computational ef- 
forts, while solutions have the right topology only. A generation of certain topologies 
using cc-bot instructions restricts the design space [4]. This limitation has both advan- 
tages and deficiencies. The computation efforts are reduced, whilst a circuit variety is 
restricted as well. The matrix representation [11] requires preliminary knowledge. It 
should be noted that it is used for fixed circuit sizes only. The integer representation, 
which is based on the linear string, has been used effectively [7, 11]. 

The complexity of evolutionary design is defined by the ruggedness of fitness 
landscape. Some investigations of fitness landscape changes have been performed. 
Dynamic fitness schedules change the fitness landscape [6], but their deficiency is a 
varying gradient of the fitness function landscape. For the following generations the 
information obtained is false. The fitness evaluation technique [13] varies for small 
and large circuits by means of penalty coefficients. Therefore, more efficient solutions 
are selected at each step. 

Note that the right choice of search algorithm represents the most important deci- 
sion for effective design. There are many publications that represent the applications 
of genetic programming, genetic algorithms, and evolutionary strategies for circuit 
design [8 - 10]. Specific mutation operators for analog circuit design have been used 
in [12]. Some researchers combine genetic search and annealing optimisation [3], 
genetic algorithms and Davidon-Fletcher-Powell optimisation method [14]. The posi- 
tive results have been obtained for the hardware immune system [2]. Recently, ant 
colony algorithm has been used for analog circuit design successfully [15]. Our focus 
is on the application of probabilistic evolutionary models. The Estimation of Distribu- 
tion Algorithms EDA has been proposed by Miihlenbein and PaaB [16] as an exten- 
sion of genetic algorithms. Instead of performing recombination of strings, EDA 
generates new points according to the probability distribution defined by the selected 
points. In [17] Miihlenbein showed that genetic algorithms can be approximated by an 
algorithm using univariate marginal distributions only (UMDA). UMDA is an evolu- 
tionary algorithm, which combines mutation and recombination by means of distribu- 
tion. The distribution is estimated from a set of selected points. It is used then to gen- 
erate new points for the next generation. It should be noted that we use truncation 
selection because of its simplicity. Some experimental results of the distribution algo- 
rithms application for several test functions have been published in [18]. Recently, we 
have proposed the dynamic coding, the hierarchy circuit representation, and the appli- 
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cation of UMDA for analog circuit design that allow us to use topological and para- 
metrical search simultaneously [19]. Observe that the hierarchical genotype structure 
reduces the search space without the restriction of circuit variety. In order to improve 
design abilities, mutation has been introduced into UMDA by a concept called Bayes- 
ian prior [18]. The choice of hyperparameter r has been considered for test functions 
[18]. We have improved the effectiveness of circuit design by means of mutation too 
[20]. UMDA with Bayesian prior is able to overcome local minima. Furthermore, we 
used the experimental research in order to make a reasonable choice of Bayesian prior 
for analog circuit design [21]. Moreover, we synthesized the circuits that are better 
than those produced by an expert circuit designer [20, 21]. 

The probabilistic approach evaluates the direct relation between the genotype and 
the fitness. In this paper we compare different circuit representations for the applica- 
tion of UMDA for analog circuit design. It should be noted that we restrict our re- 
search to the static technique of fitness function evaluation only. 



3 Circuit Representations 



There are many circuit representation techniques for evolutionary circuit design. The 
hierarchical circuit representation [19 - 21] was chosen as the basis for the investiga- 
tion for the following reasons. First, as a large number of circuits are needed to evalu- 
ate, the search within the whole space will be very slow. Second, the relations be- 
tween the circuit elements and the circuit parameters cannot be estimated for different 
elements. Physical nature is different for capacitances and inductances. Therefore, 
probabilistic models have to be evaluated for capacitance parameters and for induc- 
tance parameters separately. 

The growing technique is used during the evolutionary design. The embryonic cir- 
cuit is shown in Fig. 1. The evolutionary process is based on the topology of the com- 
plete graph. Adjacency matrix of the A- node complete graph is given as follows [22]: 



A = 



aij = 0 

fly=l 



if 

if 



i = j 
i^j 



( 1 ) 



First, we specify the boundary limits on the amount of the circuit nodes. Second, 
each element is determined by its type and parameter. Two bits define the type. Each 
element can take on 3 different types, namely, switch (0), capacitance (1) and induc- 
tance (2). Three bits define a parameter of element. The parameter of switch can take 
on two fixed values only. We consider that 0 means that a switch is turned off and 1 
means that a switch is turned on. Parameters of inductances and capacitances are 
chosen from the given range. The input of the circuit is always node 1 and the output 
is node 2. Branch 1 is always series connection of a voltage source and the source 
resistance. Branch 2 is the load resistance. The circuit being evolved contains (n-2) 
branches. The amount of the branches is given as follows: 

A^-1 

n= T(N-i). 

i=0 



( 2 ) 
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Fig. 2. General structure of the branch circuit representation Genotype 1 (a), general structure of the 
element circuit representation Genotype2 (b) 

We propose two circuit representations that use the adjacency matrix (1). Figure 2 
shows their general structure. Figure 3 illustrates the case of the 3 node topology. 
Chromosome length is equal to 23. The best circuit is shown in Fig. 3 (c). 



3.1 Branch-List Circuit Representation 

The branch-list circuit representation is one of standard circuit representation tech- 
niques for evolutionary analog circuit design. A genotype is formed by means of the 
combination of separate genes for each branch. The circuit representation used is 
similar to [7, 11, 12, 19]. Briefly, each gene consists of alleles for a type, connecting 
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nodes and value parameters. The genotype [type, connecting nodes, value parameters] 
[11, 12], the genotype [connecting nodes, value parameters, type] [7], and the hierar- 
chical genotype [type / value parameters] [19] are possible modifications. Fig. 2 (a) 
shows general structure of the branch-list circuit representation (Genotypel). The 
description of Genotypel for the chromosome of length 23 is illustrated in Fig. 3 (a). 
The correspondent adjacency matrix is given as follows: 

1 1 1 

0 1 1 

1 0 1 - ® 

1 1 0 
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Fig. 3. The branch circuit representation Genotypel (a), the element circuit representation 
Genotype2 (b), and the best circuit fc) 
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3.2 Element-List Circuit Representation 

The element-list circuit representation is a circuit representation modification. It 
should be noted that topology information is more important for circuit design. Thus, 
we suppose that a change of circuit representation could improve the effectiveness of 
evolutionary search. The topological information is concentrated within the first al- 
leles, whilst the rest of genotype holds alleles for the value parameters. Fig. 2 (b) 
shows general structure of the element - list circuit representation (Genotype2). The 
description of Genotype2 for the chromosome of length 23 is illustrated in Fig. 3 (b). 

The correspondent adjacency matrix is given by (3). Such representation is effec- 
tive when the ruggedness of the fitness landscape is reduced. Otherwise it is disadvan- 
tageous. Fig. 3 indicates that the phenotypes are the same ones, whilst the genotypes 
are different. Hence, the interactions between the alleles are different as well. There- 
fore, the efficiency of evolutionary design can be changed. In order to determine the 
deficiencies of different circuit representation it is useful to compare their behaviour. 



4 Experimental Setup 

The circuit design benchmark chosen was the lowpass filter. Such circuits have wide 
applications, although the design theory is based on some simple circuits. The starting 
point for the design of a filter is the specification of the frequencies for the filter’s 
pass-band and stop-band. The design of filters with high stop-band attenuation and 
low pass-band attenuation is hard. It is more difficult to design a lowpass filter with 
small transitional region between two bands. The input specification for the filter 
design benchmark is summarized in Table 1. The ratio between signal in the pass- 
band (IV) and the maximum acceptable signal in the stop-band (1 mV) is 60 deci- 
bels, although there is 2-to-l ratio between the stop-band frequency and the pass-band 
frequency. The known design methods generate the desired circuit, although there are 
some fluctuations in the stop-band and in the pass-band. 



Table 1. Input specifications for the filter design benchmark 



Circuit Specifications 


Values 


Minimum number of components 


3 


Maximum number of components 


6 


Pass-band frequency (Hz) 


1000 


Stop-band frequency (Hz) 


2000 


Maximum pass-band gain (dB) 


0 


Minimum pass-band gain (dB) 


-0.26 


Maximum stop-band gain (dB) 


-60 



In order to reduce computational efforts the fitness is evaluated by means of equa- 
tion-based method [20]. The numerical frequency response evaluation and the modi- 
fied nodal method are used. The fitness function evaluation technique used is similar 
to those described in [20, 21]. The fitness is evaluated in terms of the sum of the abso- 
lute weighted difference between the actual output voltage and the target one. These 
error values were summed across evaluation points Nc for the evaluation of the fitness 
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F = 



Nc 



/=0 



( 4 ) 






( 5 ) 



md{f),n 



1 if d(f) <0.03, f <1000 Hz, 

10 if d(f)> 0.03, f <1000 Hz, 

1 if d(f) <0.001, f> 2000 Hz, 
10 if d(f) <0.001, f> 2000 Hz. 



(6) 



where / is the frequency of the fitness evaluation, d(f) is the absolute value of the 
difference between the output voltage and the target at the frequency / and W( d(f), f) 
is the penalty factor for the difference d(f) at the frequency/. 

The set of circuits that is generated within the topology of the 3 node complete 
graph is examined to show how success rate is affected by different circuit representa- 
tions. The best circuit is given in Fig. 3 (c). The amount of hits is equal to 86. We 
have found [20] that the effective design is supplied by a big ratio between the popu- 
lation size and the genotype length. It should be noted that it is impossible to use a 
theoretical framework to determine an effective circuit representation. In order to 
make a fair comparison, we have to compare actual simulation results. 



5 Experimental Results 

We examined the behaviour of different circuit representations for the fixed number 
of bits n=23 with a truncation threshold x from 0.02 to 0.04. Population size changes 
from N=300 to N=500. We consider the successful runs after 10% of the population 
consisted of the best solution only. Figures 4 -7 show how the success rate and the 
function evaluations are changed when a circuit representation is modified. The re- 
sults have been obtained under assumption that the amount of iterations is less than 
the genotype length. 

Figures 4, 5 illustrate the case in which the population size varies between N=300 
and N=500 and the truncation threshold is equal to x=0.02. Figures 6, 7 show the 
case in which the population size is fixed N=300, whilst the truncation threshold is 
increased to x=0.04. The selection intensity I depends on a truncation threshold of x 
[17]. The most obvious fact is that the success rate is generally enhanced where the 
element-list circuit representation is used. The branch-list circuit description is less 
efficient. It is remarkable that the same number of function evaluations is required for 
two circuit representations approximately. Therefore, the curves have shown that the 
element-list circuit representation allows us to increase the success rate without the 
significant computational expenses. 
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Fig. 4. Performance comparison with varying mutation rate r for different population sizes and 
truncation selection with x=0.02. Succ is the number of times, when the best solution was 
found in 100 runs 




Fig. 5. Performance comparison with varying mutation rate r for different population sizes and 
truncation selection with x=0.02. FE is the number of evaluation until 10% of the populations 
consisted of the best solution 



6 Discussions 

Two different types of the circuit representation have been presented. The results 
show that the element-list circuit representation is better than the branch-list circuit 
representation. We are able to exploit larger population size and truncation selection 
with smaller x more effectively using the element-list circuit representation. There- 
fore, the assignment of topology information and parameters to the correspondent part 
of the genotype allow us to improve the effectiveness of evolutionary analog circuit 
design. 
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Fig. 6. Performance comparison with varying mutation rate r for the population size N=300 and 
truncation selection with x=0.04. Succ is the number of times, when the best solution was 
found in 100 runs 




m/(INT) 

Fig. 7. Performance comparison with varying mutation rate r for the population size N=300 and 
truncation selection with x=0.04. FE is the number of evaluation until 10% of the populations 
consisted of the best solution 
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Abstract. The communication mechanism used by the biological cells 
of higher animals is an integral part of an organisms ability to tolerate 
cell deficiency or loss. The massive redundancy found at the cellular level 
is fully taken advantage of by the biological endocrinologic processes. 
Endocrinology, the study of intercellular communication, involves the 
mediation of chemical messengers called hormones to stimulate or inhibit 
intracellular processes. 

This paper presents a software model of a multiprocessor system de- 
sign that uses an interprocessor communication system similar to the 
endocrine system. The feedback mechanisms that govern the concen- 
tration of hormones are mimicked to control data and control packets 
between processors. The system is able to perform arbitrary dataflow 
processing. Each processing stage within the system is undertaken by a 
separate group of microprocessors. The flow of data, and the activation 
of the next stage within the process is undertaken using the bio-inspired 
communication technique. The desired result is a system capable of main- 
tained operation despite processor loss. The feasibility of the multipro- 
cessor system is demonstrated by using the model to perform a simple 
mathematical calculation on a stream of input data. 



1 Introduction 

Biology provides a diverse source of inspiration that reaches across many fields 
from both artistic and scientific disciplines. Electronic Engineering is no excep- 
tion. 

One of the strongest impetuses for Bio-Inspired Engineering is the assistance 
it lends to the development of engineering design, especially where solutions 
via traditional techniques fall short. Such biological based solutions have led to 
the creation of artificial learning and pattern matching systems based on neural 
networks as well as self re-configuring systems based on biology’s ability 
to evolve and adapt. 

Of more interest to the area of Reliability Engineering is biology’s ability to 
maintain operation in the face of adverse conditions. Biology is able to employ 
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automated fault tolerance, detection and recovery characteristics that enable 
organisms to remain functional despite injury. It is the desire for electronics to 
also exhibit these three characteristics that has led to an interest in Bio-Inspired 
Reliability Engineering. 

Both Embryonics [|] and Artificial Immune Systems m mimic aspects of 
biological reliability systems. The Embryonic Architecture is capable of removing 
faulty circuit areas. Reconfiguration via the shifting of functionality to healthy 
circuitry in redundant areas returns the system to full functionality. Whereas 
Artificial Immune Systems provide a fault detection and removal mechanism 
based on the biological self, non-self principle [0. 

The ability of biological systems to tolerate and recover from a subsystems 
death is reliant upon the use of redundancy. This is a common feature present 
on a number of levels within biology’s structural hierarchy. Society maintains 
activity through redundancy in individual organisms, similarly organisms can 
function without certain organs. However, it is the cellular level that utilizes 
redundancy to the greatest effect. 

Cellular biology of higher animals provides the inspiration for the multipro- 
cessor system presented in this paper. A software model of an inter-processor 
communication system based on biology’s endocrine system is presented, includ- 
ing results that demonstrate the system’s fault tolerant characteristics. 

The subject of biological cell signalling is discussed in Section 0 How such 
signalling can be translated into a useful electronic system is presented in Section 
0 The model of the resulting multiprocessor system and operation results are 
described in Sections 0and 0 respectively. The paper is concluded in Section 0 
with suggestion for further work in Section 0 

2 Cell Signalling 

Cells are heavily dependant on signalling mechanisms for survival. The ability for 
cells to influence each other enables a multicellular organism to maintain a level 
of homeostasisQ Even some unicellular organisms utilise signalling to influence 
proliferation of other like cells [0. 

A variety of different communication systems are employed in higher ani- 
mals. Each system varies in a number of ways, but especially with regard to 
range and speed. However, in each case communication is achieved via signalling 
molecules. Each messenger molecule exhibits a biological signature that deter- 
mines its recognition by other cells. Target cells recognize messenger signatures 
through receptors. A receptor-messenger match allows molecules to bind to their 
target and complete the communication process. 

The synaptic signalling process shown in Figure 0is the most directed mes- 
senger based communication system. Nerve cells directly steer their messengers 
via their connected axons. As the neurotransmitter messengers are released so 
close to their target, affinity between receptor and messenger can be low. In 

^ Homeostasis is the process by which organisms or cells maintain a stable internal 
equilibrium via physiological changes. 
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Fig. 1. Synaptic signalling via neurotransmitters. 




(a) Endocrine Signalling (b) Paracrine Signalling 

Fig. 2. Two types of cell signalling, (a) Endocrine communication, hormones are trans- 
ported via the blood stream between source and target cells, (b) Paracrine communi- 
cation, local mediator messengers are restricted to local tissue diffusion to travel from 
source to target cells. 



contrast, messengers used in endocrine and paracrine signalling are not directed 
straight to their target. Endocrine signalling, as shown in Figure Eli, transport 
their messengers (called hormones) through the blood stream. Paracrine sig- 
nalling, Figure Etj is similar, however the message chemicals are limited to a 
localized area of tissue. 

2.1 Endocrinologic Control 

Homeostasis of higher animals is achieved by the body’s nervous and endocrine 
systems |E]. Mediated by their respective communication systems, changes in an 
organism’s internal or external environment invoke responses that stimulate the 
appropriate physiological changes required for organism adaption. 

In the case of the endocrine system, hormones released from special glands 
are able to stimulate cells, activating a variety of reactions. These reactions may 
involve the production of other hormones that in turn activate other cells. 

It is possible for cells with hormone receptors with different specificities to 
bind to the same hormone signature. Therefore, the release of a single hormone 
type is able to initiate a response in more than one type of target tissue. Also, 
hormones are not limited to stimulatory effects. They may also produce an in- 
hibitory effect on their target. These combined abilities of hormones provide the 
foundation for hormone based control. Figure Qdepicts two possible scenarios 
for endocrine control. 
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Fig. 3. Two possible scenarios for endocrine based control. 



In Figure Et, the release of hormones by the gland initiates a response by 
the target cells. The response itself then stimulates or inhibits the original gland 
to increase or decrease the production and secretion of hormones. The second 
system, shown in Figure EJ>, contains a second stage. The initiating gland pro- 
duces an initial set of hormones that activate the first group of target cells. The 
resulting action is the production of a second set of hormones that target both 
the original gland and a second group of target cells. The gland is inhibited by 
these hormones to stop its own hormone production, whereas the second cell 
group is stimulated producing the desired overall response. 



3 An Electronic System from Biology 

The biological endocrine system provides higher animals with a robust control 
mechanism. Unfortunately, this is all it is, it does not directly lend itself to use 
as an information processing or calculation system. However, if the hormone 
messengers contained data, and target cells performed operations on the data, 
the endocrinologic paradigm could be used to create such a system. 

Figure depicts a system adapted from the biological model shown in Fig- 
ure Eh. In this case, the system has been extended to contain a number of target 
groups. Each group, on stimulation, releases messengers that stimulate the next 
target group and inhibit the previous. The result is a cascade of data processing 
stages. The equivalent system, in block diagram form is shown in Figure 

3.1 Microcells: Microprocessor Cells 

All biological cells develop from a common ancestor type, the stem cell [II 1)J . 
When new cells are required, they can be developed and put to use. Unfortu- 
nately, physical growth of electronic hardware to replace malfunctioning circuitry 
is not feasible at the moment. Instead, resources have to be made available which 
can be brought into use, mimicking regrowth. 
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Fig. 4. Adapting the endocrine system to a data processing platform, (a) A succession 
of messenger activated microcell groups forming a chain of data operations, (b) An 
equivalent block diagram view of the system. 



In the case of the cells in the new system, a hardware unit is required that 
can assume a variety of functions. Then, in the situation that a cells needs to 
be replaced, a new generic cell can take its role, configuring itself to perform the 
required task. 

A microprocessor based cell, or microcell, is able to fulfil such a task. Each 
unit stores all the required cell procedures within memory, and on activation, 
programs itself with the appropriate section of software. This is an established 
bio-inspired technique used especially in Embryonics Ena, and in this case 
forms a neat parallel with the ability of biological cells to to develop into any 
number of roles P . 

3.2 Communication Space 

Biology exists in a non-rigid framework that allows the movement of matter 
and the constant reorganization of its constituent parts. In electronics this is 
simply impossible. The structure of the silicon within a device determines exactly 
where a component is located, and where it shall stay. This static nature of 
electronic features make replicating the endocrine communication mechanism 
difficult. Endocrine hormones must be free to move and co-exist if they are to 
reach their target effectively. 

As it is not possible to create a medium where data packets are able to freely 
roam between microcells, a more restricted solution is taken in this paper. A fully- 
connected mesh topology jTTTlT^ would allow free communication between any 
two microcells, but would result in an unrealistic number of connecting links for 
any reasonable size system. Instead, a ‘closest-neighbour’ system is used, where 
each microcell is connected to a number of its closest neighbouring microcells. 
Messenger data packets are able to diffuse across the network by passing from 
one microcell to the next. If the target is reached, the message data is consumed 
as its travel is complete, if not the message is moved on. 
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Fig. 5. A block diagram of the configuration used to test the multiprocessor system. 



3.3 Topology 

In a system so dependant on communication, the network topology is an im- 
portant consideration. To give each microcell an equal communication standing, 
each cell’s view of its 10 connections must appear the same. To achieve this, 
a boundary free topology with an equal quantity of neighbour connections per 
microcell is used. This takes the form of a toroidal mesh. However, this is an 
area for further work (see Section m 

3.4 Data Processing 

The system resulting from the conversion of the endocrine control system to 
electronic system is a simple dataflow processor. The system has a single input, 
the Stimulator and a single output, the Drain as shown in Figure 0). The mi- 
crocells that lie in between are grouped depending on the stage they represent 
in the system. Redundant microcells that do not perform operations on the data 
are also included in the system. Their role is to simply traffic data without alter- 
ation, and when required, assume a data operating function to replace the loss 
of a faulty microcell. 

4 Multiprocessor System Model 

The multiprocessor system was simulated using a multi-threaded Java Model 
To demonstrate the operation of the model a simulation of the simple 
data processing system, shown in Figure 0 has been performed. The results are 
presented in Section 0 

The network produced by the model is shown in Figure 0 Each sphere repre- 
sents a single microcell. The connections between the spheres show the connec- 
tivity of the network. In this particular network all connections are full-duplex, 
however half-duplex connections are possible. The system Stimulator and Drain 
can also be seen. 

5 Simulation Results 

The key points of the systems operation are shown in the following three Figures 
□ 0(The data for each graph were taken from the same simulation run) . The 
messaging activity of the system is complex due to the random nature in which 
microcells pass data. However, a pattern in cell activity levels, with similarities 
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Fig. 6. A representation of the multiprocessor system. Cells 0, 4 & 5 and Cells 1, 3 & 
11 perform different processing tasks. Cells 2 and 9 have been killed off to demonstrate 
fault tolerance. The remaining cells are redundant. The main system input (Stimulator) 
and output (Drain) are also shown. 



to biorhythms [E], can be observed. Figure Q shows the activity levels of cell 4 
(see Figure E). The stimulation and inhibition components that define the cells 
activity can be seen. 

Each rise and fall in activity is regulated by the initial stimulus of input 
data. Comparison of this graph and that in Figure 0show a correlation in the 
introduction of new data, and subsequent cell activation. 

Figure 0shows the overall system input and output. Each spike of Figure 
represents the activation of the system Stimulator. It can be seen that these 
spikes lead to a number of output spikes, Figure 0>- These represent the arrival 
of final stage messengers at the system Drain. Further study of the graph shows 
that the correct mathematical operations (see Figure 0 are being performed on 
the data. 

The final graph shown in Figure Qshows the effects of microcell loss. At two 
points during the simulation a microcell is killed off, and then replaced by a 
redundant cell. As Figure Qproves, operation is maintained despite the loss. 

6 Conclusion 

So far, the simulation results have been encouraging. Even though the system 
is in its very early stages of development it is able to perform simple processing 
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Fig. 7. Graph of the activation levels of microcell 4 (See Figure 0. The total acti- 
vation level (c) can be seen as the addition of the stimulatory (a) and inhibitory (b) 
components. 




50 
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150 



Fig. 8. Graph showing the overall system input (a) and output (b) data streams. 



on a flow of data. Microcell loss is also tolerable and the ability to return the 
system to a full complement of processing units is possible. 

Biological Endocrinology has been presented as another source of inspiration 
for engineering. Furthermore, this paper has made a start to show it is possible 
to glean some of its reliability properties via its replication as an electronic 
communication system. 
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Fig. 9. Graph of activity levels in four different cells as they become activated and 
deactivated, demonstrating the systems ability to deal with microcell loss. 



7 Further Work 

The development of the presented system model is currently still within its early 
stages. Consequently, there remain many features to be included and issues to 
be dealt with. The rest of this section introduces some of these. 



7.1 Different Topologies 

At present each microcell is linked to only four other neighbours. This is sup- 
ported in a toroid topology that enables the removal of boundaries from a stan- 
dard square grid. 

Other network topologies that have more intermicrocell links need to be 
tested. Extra connections should provide greater freedom of movement for mes- 
sages. Although this would bring the communication space representation closer 
to that of biology, there is a cost in extra hardware. 



7.2 Pipelining 

The system developed so far has a very limited throughput. The activation level 
of each microcell must always settle to zero between stimulations. This is required 
to avoid data from different time frames mixing and consequently invalidating 
the output data stream. Ideally, the system should perform in a similar man- 
ner to a pipeline with each microcell group performing concurrently on 

consecutive data samples. 
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7.3 Message Redirection 

The level of activity in each microcell is dependant on its location in the network. 
This is an undesirable consequence of the random nature in which the choice of 
next cell is made when passing messages. The flow of blood in a biological organ- 
ism provides a broad transport medium for hormones. Blood carries hormones 
past all the cells in the organism, thus averaging the likelihood of cells receiving 
hormones. 

The introduction of a biasing in the redirection algorithm for message trans- 
mission would make it possible to create a general direction of message flow 
within the network. This would lower the likelihood of activity localization within 
the network. 



7.4 Fault Detection and Recovery 

Ultimately, a system that has automated fault detection and recovery is desired. 
Following the design based on endocrine communication has led to a very suitable 
architecture for implementing such features. 

7.5 Real World Application 

The need for Reliability Engineering is a consequence of the use of electronics 
in Real World applications. A feasibility study of the presented system in use 
within a ’real world’ context would be an interesting area for further work. Due 
to the data flow processing nature of the system, a possible application could be 
found in the area of Automated Control Systems. 
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Abstract. In this paper, we show how artificial evolution can be used 
to improve the fault-tolerance of electronic circuits. We show that evo- 
lution is able to improve the fault tolerance of a digital circuit, given 
a known fault model. Evolution is also able to create sets of different 
circuits that, when combined into an ensemble of circuits, have reduced 
correlation in their fault pattern, and therefore improved fault tolerance. 
An important part of the algorithm used to create the circuits is a mea- 
sure of the correlation between the fault patterns of different circuits. 
Using this measure in the fitness, the circuits evolve towards different, 
highly fault-tolerant circuits. The measure also proves very useful for fit- 
ness sharing purposes. We have evolved a number of circuits for a simple 
2x3 multiplier problem, and use these to demonstrate the performance 
under different simulated fault models. 



1 Introduction 

A common technique for improving the fault tolerance of digital circuits is to 
use redundancy, e.g., multiple copies of the same circuit. Using majority voting 
within an ensemble of circuits, one or more faults in a single circuit can always 
be corrected. Faults in different circuits of the ensemble can usually be corrected 
if and only if they affect different bits in the output. Correlation between the 
circuits used in the ensemble can degrade the performance, if the same fault is 
likely to occur in all circuits. Error correction becomes less successful in this 
case. For this reason, designers generally try to increase the diversity of circuits 
used in an ensemble (Lala & Harper 1994, Mitra, Saxena & McCluskey 2001, 
Mitra, Saxena & McCluskey 1999). More advanced word-based error correction 
algorithms may be able to detect and possibly correct some faults that occur 
at the same bit (Mitra & McCluskey 2000), but fault correlation still degrades 
error correction performance. In any case, there does not exist a satisfactory and 
systematic approach to generating diverse circuits. 

Evolutionary algorithms have the advantage that they often are able to create 
a variety of different circuits that perform the same function. Moreover, given a 
fault model, correlation between the faults of different circuits can be measured, 
and used in a fitness function. In this paper, a novel approach to generating 
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a population of diverse circuits, based on evolutionary algorithms, is proposed. 
The approach provide a very different alternative towards fault-tolerant systems 
from other existing approaches. Our experimental results show that the proposed 
evolutionary approach can produce highly reliable and fault tolerant circuits. 

The rest of this paper is organised as follows. Section 0 explains the ideas 
behind our evolutionary approach. Section 0 describes the circuit behaviours 
with and without the presence of faults. Section Ogives the experimental results. 
Section ^discusses related issues to our evolutionary approach. Finally, Section 0 
concludes the paper. 

2 Evolving Fault- Tolerant Circuits 

Redundancy is often necessary in fault-tolerant circuits. However, it is unclear 
what would be the optimal way to organise redundancy. Diversity has been recog- 
nised as a key issue in designing redundant fault-tolerant circuits. However, it is 
by no means clear how to design an ensemble of diverse fault-tolerant circuits. 

Our idea is very simple (Yao & Liu 2002). We use an evolutionary algorithm 
to evolve digital circuits. The algorithm is different from others in its fitness 
function which takes into account of the correlation of the error patterns of the 
circuits. We then combine the best set of circuits from one or more evolution- 
ary runs into an ensemble (of redundant circuits). This ensemble can be tested 
under different fault models and compared to individual circuits and ensembles 
of identical circuits. The evolutionary algorithm optimises the fault tolerance of 
the set of circuits by optimising individual fault tolerance, and by maximising 
the variety in fault pattern between the circuits. Our approach therefore can be 
seen as sitting between approaches where individual circuits are directly evolved 
for fault tolerance (for example Canham & Tyrrell 2002), and approaches where 
the diversity of the whole population is used (for example Tyrrell, Hollingworth 
& Smith 2001). 

In this section, we first describe our experimental setup, the representation 
used, and other algorithm details. We then discuss how we calculate the error 
patterns and their correlation, and how they are introduced into the fitness 
function. 

To test our idea, we used a simple test setup to evolve binary arithmetic 
circuits. These circuits take two inputs of length a and b bits, and have c output 
bits. The representation we use is similar to that described in Miller, Job & 
Vassilev (2000) and V. K. Vassilev (2000). It is based on a, K columns by L rows 
grid of nodes. Each node Nk,i is programmed to perform one of a predefined 
set of binary functions. Each node has two inputs which are connected to other 
nodes. The genotype encodes, for each node, the function performed and the 
node coordinates of the two inputs for this node. In order to avoid feedback 
loops, processing is only allowed in a left-to-right order. Nodes in column Ci can 
only have inputs from other nodes in column A special column Co contains 
the inputs to the circuit in a predefined order; this column is not influenced by 
the genotype. Finally, the rightmost column Ck controls the outputs. Nodes in 
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this column are treated differently from other columns in that the function is 
ignored and the output value is directly taken from the first input. The nodes 
in the last column therefore only define the wiring of the output. In this aspect, 
our representation differs from Miller et al. (2000), where a special section of the 
genotype defines the output wiring. 

Because the inputs and outputs are in single columns, the grid needs to have 
at least as many rows as the larger of a -I- 6 (the number of inputs) and c (the 
number of outputs). Nodes in column Co that are not connected to inputs are 
fixed to 0. 

For our experiments, we evolve a 2-bit by 3-bit multiplier (a = 2, & = 3) with 
5-bit output (c = 5). We evolve the circuits on a 7x7 grid, where the 7th column is 
the output column. In order to improve the chances of evolving different circuits, 
we allow the algorithm to use a large set of different node functions, including 
and, or, xor, nand, nor. We use uniform crossover and a mutation operator that 
randomly changes one of the inputs or replaces a node function with a new 
random one. 



3 Computing Error Behaviour 

The goal of our algorithm is to create individuals that differ in their error pat- 
terns. Hence we need to compute the error patterns. There are two parts to this. 
The first part is the behaviour of the design in the absence of circuit faults. If we 
assume that the input-output mapping is completely defined, all correct circuits 
should produce the same outputs given the same inputs. However, during evo- 
lution we generally are not dealing with prefect circuits. Different circuits can 
differ strongly in the input-output mapping they implement. 

The second part is the behaviour of circuits in the presence of faults. Here, 
the goal is that different circuits will behave differently in response to damaged 
nodes in the circuit. For both computations, we look at the ‘output string’ of 
the circuit. This string contains the signal of all the outputs of the circuit for all 
possible input combinations. For example, a circuit with 5 inputs and 5 outputs 
will produce an output string of length 5*2® = 160. The ‘correct’ output string 
is defined by the function that the circuit implements. For all correct circuits 
the output string will be identical to the correct string. 



3.1 Circuit Behaviour without Faults 

The first step in the computation of the fault free behaviour is to convert the 
genotype into a circuit. The output string can then be calculated by simulating 
the circuit. By comparing the output string with the correct string, we can get an 
‘error string’ which is ‘1’ wherever the two differ, and ‘0’ otherwise. The number 
of ‘I’s in the fault string gives a straightforward measure of the quality of the 
circuit. 
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3.2 Circuit Behaviour with Faults 

The behaviour of a circuit with faults depends on the nature of the fault or 
faults. Generally, it is not possible to predict which element of a circuit will 
become faulty, and which fault exactly it will develop. Instead, a probabilistic 
fault model is used here; which describes the likelihood of a fault in each element 
of the circuit. For our experiments, we assume a very simple fault model, where 
each node in the circuit is equally likely to develop a fault where the output is 
either always 0 (‘stuck-at-zero’) or always 1 (‘stuck-at-one’). 

With a given phenotype in the node-grid description, it is possible to test 
the behaviour of the circuit with faults. To do this, we first identify which of the 
nodes in the circuit contributes to the output. These nodes, which we refer to 
as active nodes, are found by backward-chaining from the outputs to the inputs 
in the circuit, and marking the nodes used. We then exhaustively test the fault 
behaviour by going through all active nodes and switching them to ‘stuck-at-one’ 
and ‘stuck-at-zero’. A circuit with n active nodes will need 2n tests. As in the 
fault-free case, an error string indicates the difference to the correct circuit; but 
here the error vector is computed by adding a value of 1 at each position where 
the signal is incorrect for each of the 2n tests. 

After all tests, we normalise the values in the error string by dividing it by 
2n. Under the simple fault model described above, the error string gives the 
probability that for a single, random fault at one of the active nodes in the 
circuit, a particular bit in the output string is wrong. 



3.3 Error Correlation 

For the evolutionary algorithm, we are interested in the correlation of the error 
patterns of different circuits, with and without faults. We therefore use a new 
error string, created by adding error strings from the fault-free circuit and from 
the circuit with simulated faults. This new string cannot be interpreted as a 
string of error probabilities as in the previous section (it will contain values 
greater than 1.0 for incorrect circuits); but gives a good characterisation of the 
error behaviour of the circuit. The error correlation between two circuits can 
then be computed as the scalar product of the error vectors of two circuits. This 
value will be highest if the two are closely correlated, in other words, if the bits 
with high fault rates are at the same positions in the error strings. 



3.4 Fitness Computation and Selection 

For the fitness computation, we compute the average of the error correlation 
between a candidate circuit and all the circuits in the population. This fitness 
function rewards circuits that have 

— fewer errors when tested without faults, 

— fewer errors when tested with faults, and 

— an error distribution that differs from that of other circuits in the population. 
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Parent selection is done using a simple tournament selection with a tourna- 
ment size of 2. Survival for the next generation is also done using tournament. 
For each offspring we select the worst circuit from a tournament of 2 or 3 of the 
current population, and replace the circuit in the population with the offspring 
if the offspring has an equal or better fitness. Because the population changes 
and the fitness is based on a comparison between the circuits in the population, 
the fitness of each circuit in the population has to be updated after each gener- 
ation. When digital circuits are evolved using the correctness of the output as 
fitness, ’plateaus’ exists in the fitness landscape, and it has been shown that it 
is important that the population is able to move along these plateaus into dif- 
ferent areas of the search space (Vassilev & Miller 2000, T. Yu 2001). Using our 
fitness function, plateaus do not exist in the fitness landscape. The correlation 
introduces dynamic gradients within the sets of individuals with identical bit 
error counts, driving individuals into different areas of the search space. 

4 Experiments 

To evaluate our idea, we have performed a number of tests. In each test, we have 
performed one or more runs of the evolutionary algorithm as outlined above. 
The runs use different setups and termination criteria. At the end of runs, we 
take the population of one or more runs, selected ensembles, and tested their 
performance. The following section describes how we test for fault tolerance. The 
remainder of this section describes the individual tests and results. 



4.1 Testing Fault Behaviour 

As described in Section O we are not so much interested in the behaviour of 
single circuits, but that of ensembles. We select ensembles of three circuits and 
use a simple bitwise majority voter to select the outcome of the computation. 
We then compare this to the outcome of the simulation of three identical circuits 
with a majority voter. 

In order to test the evolved circuits for fault tolerance, we are using a number 
of tests with different fault models. In each case, we only test correct individuals, 
i.e. individuals that produce a correct result in the fault free case (see also 
Section EH- 

I. Two faults with equal fault distribution. In this simulation, we randomly 
pick a circuit and introduce a single fault, and then repeat this for a second 
fault. Each fault will be either a stuck-at-one or a stuck-at-zero fault and 
will hit any of the active nodes with equal probability. We perform this test 
100,000 times and count the number of bit faults in the output string of the 
ensemble. 

For correct circuits, the outcome of these tests can be calculated from the 
error strings of the circuits. The ensemble will return a wrong value if two 
of the circuits return a wrong value. 
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If we define as the probability that a particular bit is wrong given a single 
fault in circuit n, we can compute the fault probability for an ensemble of 
three circuits as 

11111111112, 

gPl*gP2+gPl*gP3+gP2*gP3+gP2*gPl+gP3*gPl = g (P 1 ^2 YPlPS +^2^3 ) ' 

(1) 

For an ensemble of 3 identical circuits, this produces |(plpl). If we repeat 
this for all bits in the output string, we find that the expected number of 
bit-errors is 2/9 of the sum of the cross-correlations. 

2. N{N > 2) faults with equal fault distribution. For this test, we randomly 
pick one of the three circuits and introduce a random fault into the active 
nodes of this circuit. We repeat this until N faults have been introduced, 
taken care not to introduce a fault into the same node more than once. Again, 
we perform this test 100,000 times, and count the total number of bit-faults. 
For this case, the outcome cannot be calculated from the fault strings of the 
individual circuits, as they are only computed for the single-fault case. 

3. N{N > 2) faults with unequal distributions. In real world circuit imple- 
mentation, it is often observed that the active nodes in a circuit are not all 
equally likely to develop a fault, and that the likelihood of different types of 
fault may also vary for the different nodes. Reasons for the variations could 
be proximity to sources of heat or radiation, proximity to other signals, dif- 
ference in the quality of power and clock signals, etc. For identical circuits, 
it is possible that the fault distributions are correlated. In other words, if a 
certain node is more likely to develop a fault in one instance of the circuit, 
this would also be the case in the other two instances. 

This simulation assumes the extreme case, where the fault distribution is 
identical for identical circuits. For each test, we first generate between one 
and three fault distributions. For a circuit with n active nodes, we generate 
a set of fault probabilities 

1 1 -I- (A: — 1) * s 

Pi ~ ■ 7 Pk 7 ■ • ■ 7 Pn “ 7 

C C C 

where c is the normalisation factor and s is a scaling factor which allows 
us to set the largest value to s times the smallest value. We then associate 
these probabilities to the active nodes in the circuit in a random order. These 
values represent the probability that a particular node develops a fault. In 
each test they will be distributed differently over the circuit, but the set of 
probabilities will be the same. If we test an ensemble with identical circuits, 
the order will be the same for all the circuits. If we test different circuits, a 
separate set (depending on the number of active nodes in the circuit) is used 
for each, and the distribution is different for each. 

4.2 First Perfect Individuals 

In this test, we have performed a number of runs with a population size of 200, 
stopping the runs after the first correct individual appeared. We then load the 
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correct individual of 20 different runs into a program and compute the correla- 
tions between the error strings of the individuals. Finally, we pick the ensemble 
of 3 individuals with the lowest sum of cross-correlation between them. The 
results of this test are shown in Table 0 



Table 1. Correct Circuits from 10 runs. 



Number 


Active Nodes 


Error 


Auto-correlation 


0 


23 


13.0 


1.41 


1 


22 


11.66 


1.08 


2 


20 


11.52 


1.13 


3 


25 


10.66 


0.94 


4 


19 


10.63 


0.90 


5 


17 


12.38 


1.30 


6 


15 


14.17 


1.77 


7 


17 


11.53 


1.12 


8 


26 


10.71 


0.96 



The first two columns of Table IQshow the circuit identification number and 
the number of active nodes used in the circuit. The number of active nodes alone 
indicates that the circuits are very different from each other. They use between 
15 and 26 nodes. The following column reports the sum of the values in the 
error string of the circuit, it gives the expected number of bit errors for a single, 
random fault in the circuit. This column shows that the circuits vary strongly 
in their sensitivity to faults, circuit 6 showing more than 30% more errors than 
circuits 3 and 4. Finally, the last column shows the auto-correlation of the error 
string (the correlation with itself). Not surprisingly, it is closely related to the 
error count, but not entirely: circuit 2 has a lower error count, but a higher 
auto-correlation, than circuit 4. 

The node number indicates that the circuits are different, but this does not 
guarantee that the error strings are different. For this, we compare the error 
strings, and calculate the cross-correlations between the circuits. Figure Qshows 
the error strings for two circuits, circuit 4 and 8. As can be seen, they are 
fairly similar to each other for some bits, and vary quite strongly for others. 
Comparing the patterns with other circuits, it becomes apparent that certain 
patterns appear in all circuits. For example, the leftmost 32 bits are the lowest 
bit of the output, in this case the product of the inputs. In a correct circuit, 
this bit is the result of an ‘and’ operation of the lowest two inputs. It could be 
implemented with 2 or more different gates, but there is little evolution can vary 
to improve fault behaviour. For this reason, there is a limit as to how far the 
cross-correlation of circuits can be improved. 

Table Qshows the cross-correlations for 9 circuits. The diagonal again shows 
the auto-correlation. The other fields show the cross-correlation between two 
circuits. The numbers show that the value of the cross-correlation very much de- 
pends on the quality of the individual circuits: circuits with lower auto-correlation 
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Fig. 1. Fault probabilities for bits in the output string for circuits 4 and 8. 



Table 2. Cross-correlations between circuits. 





0 


1 


2 


3 


4 


5 


6 


7 


8 


0 


1.41 


0.94 


1.04 


0.88 


0.94 


1.07 


1.19 


1.01 


0.89 


1 


- 


1.08 


1.00 


0.94 


0.91 


1.12 


1.26 


1.03 


0.94 


2 


- 


- 


1.13 


0.97 


0.97 


1.17 


1.35 


1.09 


0.90 


3 


- 


- 


- 


0.95 


0.87 


1.03 


1.22 


0.95 


0.89 


4 


- 


- 


- 


- 


0.90 


1.04 


1.24 


0.96 


0.81 


5 


- 


- 


- 


- 


- 


1.31 


1.43 


1.19 


0.99 


6 


- 


- 


- 


- 


- 


- 


1.77 


1.33 


1.13 


7 


- 


- 


- 


- 


- 


- 


- 


1.12 


0.91 


8 


















0.96 



generally have better cross-correlations, too. Combining different circuits does 
bring improvements, but only if the auto-correlation of the circuits are similar 
in value. The best cross-correlation is 0.81, between circuits 4 and 8. Compared 
to the best auto-correlation (0.90 for circuit 4). This is an improvement of 10%. 
The importance of using different circuits is therefore small compared to using 
good circuits. 

Determining the best ensemble of three circuits produces an interesting re- 
sult: it consists of two copies of circuit 4 and one of circuit 8. The cross-correlation 
between circuits 4 and 8 is noticeably lower than all the cross-correlations. The 
auto-correlation of circuit 4 is not much larger than other cross-correlations. It 
therefore turns out to be beneficial to only use these two circuits. 



Test Results. With the ensemble of circuits 4, 4, and 8, we have tested the 
performance when 2 or more faults are introduced into the circuits. We only 
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Table 3. Total bit-error count of ensembles, for 100.000 tests. 



Ensemble 


2 faults 


3 faults 


4 faults 


4-4-4 


60,613 


168,891 


312,264 


8-8-8 


64,881 


175,855 


323,992 


4-4-8 


56,364 


156,999 


292,672 



report the test results (Table 0 with equal fault probabilities for all nodes here 
(tests 1 and 2 in Section b.ll . Results of tests with unequal fault probabilities 
are similar to those described in the next section. 

The results of the tests confirm that the ensemble of circuits 4 and 8 performs 
about 10% better than an ensemble of circuit 4 only. It also shows that this 
improvement still holds if more than 2 faults are introduced into the circuits. 
The numbers in the 2 fault case tally well with the numbers expected from 
Equation D for the ensemble with circuit 4 only, the expected number of faults 
is 60000, for the ensemble with circuit 8 only it is 64000. For the mixed ensemble, 
the expected fault number is 56444. 

4.3 Continued Evolution 

The previous section has shown that combining different individuals can im- 
prove the result, but the individual quality of the circuits is more important. 
It therefore suggests itself to try and improve fault tolerance of the individual 
circuits through evolution. We have done another set of runs, this time with 
400 individuals, which continued to evolve after the first correct individual had 
been found. These runs were only stopped after the cross-correlation fitness of 
the best individual had dropped below 0.5. From 6 of these runs, we took all 
correct individuals and compared them as in the previous section. The circuits 
are generally larger, they use between 26 and 31 nodes (see also Section . The 
average number of bit-errors for a single circuit with a single fault lies between 
7.2 and 8.5, clearly a lot better than the results with the first correct circuits. 
Similarly, the auto-correlation lies around 0.5, with the best value of 0.475, 50% 
better than the results of the previous section. Combining different circuits into 
an ensemble again has an advantage of about 10%, reducing the sum of the 
cross-correlations from 1.42 to 1.35. 



Test Results. With the best performing ensemble from the six runs, we have 
again performed fault simulations. Table Elshows the results. 

The tests were performed using unequal fault probabilities, as described for 
test 4, Section 14. 1 1 The scaling factor s is set to 1.0, 2.0, 10.0 and 100.0. With 
s = 1.0, all nodes have an equal probability of developing a fault. As can be 
seen, the ensembles perform much better than those used in the Section 14.21 
The mixed ensemble produces the best results, by about 10%. The effect of the 
correlation introduced by the unevenness in fault distribution can be seen in the 
other numbers: as expected, it leads to a worsening of the fault tolerance for 
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Table 4. Bit-faults for different circuits, after 100000 simulations. 



circuit 


s 


2 Faults 


3 Faults 


4 Faults 


1-1-1 


1.0 


31298 


89267 


170173 




2.0 


32001 


90136 


170532 




10.0 


33366 


94497 


175307 




100.0 


33441 


94866 


179556 


2-2-2 


1.0 


33064 


94998 


179187 




2.0 


34492 


95212 


182274 




10.0 


36016 


102351 


190663 




100.0 


36926 


105852 


197402 


3-3-3 


1.0 


33285 


90685 


175942 




2.0 


33094 


92568 


176384 




10.0 


34050 


98205 


181918 




100.0 


36348 


98931 


185828 


1-2-3 


1.0 


30324 


86175 


165387 




2.0 


30519 


85747 


163070 




10.0 


31319 


86074 


163841 




100.0 


30933 


86167 


162325 



those ensembles with identical circuits. For the ensemble with circuits 1, 2, and 
3 on the other hand, the uneven fault distribution does not show any effect on 
the error numbers. 



5 Other Aspects 

5.1 Fitness Sharing Performance 

As described in Section II. 4J the error string used for the fitness computation 
combines the errors of the circuit in the no-fault case and in the fault simu- 
lation. A circuit that has the same number of bit-errors as other circuits in 
the population, but at different position in the output string, will have a low 
cross correlation. Because the fitness is based on the average correlation between 
circuits, it turns out to be a very effective fitness sharing mechanism. 

For a comparison, we performed a set of runs with 200 individuals, using the 
number of bit errors in the output string as fitness instead. We run these until 
they either found a correct individual, or reached 1,000,000 generations. Of 50 
runs, 18 did not find a correct circuit in this time, the rest needed on average 
252775 generations to find a correct circuit. Compared to this, a set of 33 runs 
using the average cross-correlation all finished in the generation limit, and took 
on average 14715 generations. 

5.2 Evolving Ensembles 

In the early stages of the evolutionary run, the no-fault bit-error count of the 
best ensemble is often noticeably better than that of the best individual. This 
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can be intuitively explained: if random individuals are sampled, we are more 
likely to sample a good individual the more individuals we sample. With an 
ensemble, 100 individuals can be combined into nearly 1,000,000 combinations. 
Not all of these will be different, but the number of samples, and therefore the 
likelihood of a better result, is increased. 

We have tried to make use of this by evolving individuals as part of an en- 
semble. The hope is that this leads to some kind of implicit ’divide and conquer’, 
where each individual is allowed to ignore parts of the output string. So far, how- 
ever, this has not lead to a satisfactory behaviour of the evolutionary runs. The 
reason for this seems to be that if the algorithm finds a new circuits with fewer 
bit-errors than the currently best, this circuit will not automatically improve the 
bit-errors of the best ensemble - in fact is has been observed that the individual 
with the lowest bit-rate quickly got lost again in the population. On the other 
hand, individuals with very large numbers of bit-errors can often become part 
of good ensembles. The result of these two effects is that the runs fail to im- 
prove the bit-error count of the individuals sufficiently to achieve an ensemble 
without bit errors. Changes in the way the selection and fitness calculation is 
done might improve the behaviour. A promising variation we are looking at is 
using ensemble fitness only once we already have correct individuals. This would 
allow the circuits to improve the ensemble error count with faults, by reducing 
the correlation between the error patterns. 

5.3 Other Fault Models 

Our calculations and tests assume a very simple fault model. With different 
fault models, it is likely that some of the results are different. One of the major 
assumptions in our model that is unlikely to be observed in the real world is 
that the likelihood of a fault is independent of the number of nodes in a circuit. 
When we calculate the fault string, we normalise it by the number of tests 
(and therefore nodes); and when we test circuits, circuits will get single faults 
independent of the number of nodes. One result of our experiments was that the 
individuals evolved to have a low average error correlation (Section 14.5 1 were 
generally larger than the first correct circuits (Section n:zi . However, this was not 
always the case, as both experiments had cases where smaller circuits performed 
better. Most likely, if the fault likelihood was to be made proportional to the 
number of nodes, the number of nodes would not grow, but an improvement of 
the error string correlation would still be possible. 

6 Conclusions 

Our experiments show that it is possible to use artificial evolution to create 
digital circuits that have different error patterns when faults are introduced, 
and that this is beneficial for fault tolerance. The experiments also show that the 
sensitivity to faults of individual circuits has a much larger influence on the fault 
tolerance of the ensemble. Continuing the evolution after the first correct circuit 
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is found and using a fitness function that is based on the correlation of fault 
patterns allow us to breed circuits that have a low sensitivity to faults as well as 
high diversity. The variation in the population generated by this fitness function 
also improves the performance of the evolutionary algorithm. The implicit fitness 
sharing scheme we used was shown to reduce the number of generations required 
to find a correct individual by more than a factor of 10. The next step of our 
study is to further research our approach and investigate its real use to large 
and practical circuits. 
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Abstract. Most evolutionary approaches to fault recovery in FPGAs 
focus on evolving alternative logic configurations as opposed to evolving 
the intra-cell routing. Since the majority of transistors in a typical FPGA 
are dedicated to interconnect, nearly 80% according to one estimate, 
evolutionary fault-recovery systems should benefit by accommodating 
routing. In this paper, we propose an evolutionary fault-recovery system 
employing a genetic representation that takes into account both logic and 
routing configurations. Experiments were run using a software model of 
the Xilinx Virtex FPGA. We report that using four Virtex combinational 
logic blocks, we were able to evolve a 100% accurate quadrature decoder 
finite state machine in the presence of a stuck-at-zero fault. 

1 Introduction 

Numerous advantages of Field Programmable Gate Arrays (FPGAs) in space- 
borne electronics have been identified in recent research publications EE3 and 
manufacturers’ literature EUni- Benefits include reconfiguration capability to 
support multiple missions, the ability to correct latent design errors after launch, 
and the potential to accommodate on-chip and off-chip failures. Ground Support 
Equipment (GSE) based FPGA applications primarily employ reprogrammable 
devices as a means of amortizing development costs over multiple missions. In 
GSE-enabled applications such as Reusable Launch Vehicles (RLVs), FPGAs 
are configured or replaced between missions rather than being reprogrammed 
during flight. For applications such as RLVs, comparatively short mission dura- 
tions and low levels of ionizing radiation are involved. Hence for many ground 
reconfigurable applications, conventional Triple Modular Redundancy (TMR) 
techniques often provide sufficient fault handling coverage. 

On the other hand, in-mission reconfigurable FPGAs are advantageous for 
deep space probes, satellites, and extraterrestrial rovers. In these applications, 
the radiation exposures, mission durations, and repair complexities are signifi- 
cantly greater. The need for adequate fault coverage during these missions has 
become further intensified by the increasing number of FPGAs being deployed. 
For instance, NASA’s Stardust probe contains over 100 FPGA devices. Although 
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the Stardust’s FPGAs are based on a non-reprogrammable antifuse-based tech- 
nology, a more recent space-qualified SRAM-based technology has become com- 
mercially available. 

In SRAM-based devices, the number of programming cycles is unlimited. 
Hence new techniques become feasible for active recovery through reconfigura- 
tion of a compromised FPGA. The approach developed here concentrates on 
autonomous reconfiguration of SRAM-based devices while in-flight. The experi- 
ments conducted involve Xilinx’s SRAM-based Virtex parts from the same device 
family as the space-qualified QPRO radiation-hardened series. 

Permanent Single-Event Latchup (SEL) failures may impact GLBs and/or 
programmable interconnections within the FPGA itself. They may also involve 
other supporting devices that the FPGA interfaces with or processes data from. 
These failure modes also suggest that the ability to derive an alternative FPGA 
configuration in-situ would be beneficial. Likewise, SEL exposures exist with re- 
gards to the data processing path within the FPGA that is not involved with the 
device’s programmable configuration. In the above cases, the FPGA configura- 
tion derived at design time will no longer provide the required functionality for 
the damaged part. Traditionally, redundant spares have been utilized to replace 
the damaged device. 

Autonomous repair can work in concert with or provide an alternative to 
device redundancy. While redundant spares exist only in limited quantities, evo- 
lutionary recovery methods attempt to facilitate repair through reuse of dam- 
aged parts. Hence the potential benefits are two-fold. First, one or more failures 
might be accommodated by reconfiguring the failed part without incurring the 
increased weight, size, or power traditionally associated with providing redun- 
dant spares. Second, the characteristics of the failure need not be precisely diag- 
nosed in order to be repaired. Here the repair is performed in-situ via intrinsic 
evaluation of the device’s remaining functionality. This implies that any resid- 
ual functionality, including the electrical characteristics of both the damaged 
device and its interaction with any supporting devices, is taken into account 
when realizing the repair. After isolating the fault to a size that is manageable 
for the evolutionary algorithm, alternate solutions are refined though iterative 
selection. This can be carried out without detailed knowledge of the underlying 
failure mechanism itself. 

The approach developed here attempts to regain lost functionality due to 
a fault by evolving a new configuration on the defective FPGA. We assume a 
dual-redundant FPGA system whereby the faulty FPGA undergoes evolution 
to recover its functionality while the redundant FPGA maintains proper func- 
tionality during evolution on the faulty FPGA. Thus after a fault is detected, 
redundancy is lost for a short period of time and then restored. Application 
functionality is maintained throughout this process under the assumption that 
only one of the FPGAs fails. Our results are that the evolutionary methods are 
able to fully recover from a simulated stuck-at-zero fault in the input of a state 
machine implementing a quadrature decoder. Several research challenges remain 
and they are also discussed. 




A Genetic Representation for Evolutionary Fault Recovery in Virtex FPGAs 



49 



2 Related Work 

Recently, various evolutionary algorithm approaches have been proposed for 
fault-recovery of FPGAs. Some previous work applies evolutionary algorithms 
prior to the occurrence of the fault while other approaches attempt to repair the 
fault after its occurrence. Some techniques involve intrinsic evolution using the 
failed part itself. Others rely on extrinsic evolution of an abstracted model of 
the devices. 

Three examples of recent work that apply evolutionary algorithms to realize 
fault-tolerant designs include @], and In IHll . Miller examined proper- 
ties of messy gates whereby evolved logic functions inherently contain redundant 
terms as their functional boundaries change and overlap. In |^, Canham and 
Tyrrell compare the fault tolerance of oscillators evolved by including a range of 
fault conditions within the fitness measure during the evolutionary process. A 
population-based approach scores evolved designs using a fitness function corre- 
sponding to desired operation based on the absence of faults. When evolution is 
complete, an additional pass evaluates the ability of the evolved individuals to 
tolerate a range of faults, and the most fault-tolerant individuals are retained. 
In m. the evolution of designs containing redundant capabilities without the 
designer having to explicitly specify the redundant parts themselves was inves- 
tigated. To achieve this, a range of fault cases was introduced throughout the 
evolution process. This allowed individuals to exploit whatever component be- 
haviors exist, even behaviors known to be faulty. 

An evolutionary fault-recovery approach is described by Vigander [il4j . He 
develops a genetic algorithm to restore functionality after random faults are in- 
jected into a 4-bit by 4-bit multiplier using standard genetic operators. He sim- 
ulated the repair of the prior-designed multiplier that consisted of feed-forward 
interconnection of hypothetical FPGA cells capable of 8 different logic functions. 
He used as his fitness function the number of correct input-output mappings from 
the 256 possible input combinations that could be applied to the multiplier. He 
demonstrated that while it is not exceedingly difficult to derive a solution that 
can produce a nearly correct repair, completely correct repairs present a chal- 
lenging problem. To remedy this, he demonstrated that a voting system with as 
few as three alternatively evolved repaired circuits was capable of producing a 
majority output that was completely correct. 

3 Representation and Operators 

Several goals were taken into account while designing the representation scheme. 
Amenability to recombination is of course a primary concern. After that, our 
priorities were to let the GA work in the largest, most flexible design space 
as possible: we wanted to allow all possible LUT configurations and allow the 
maximum number of GLB interconnections given the constraints of hardware 
routing support (we will say more about the routing at the end of this section) . 
We also wanted to disallow illegal configurations and to minimize non-coding 
alleles (introns). 
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Bitstring representations are a natural choice for FPGA applications, and 
many times the raw configuration string can be used as the representation. In 
our case, we chose a bitstring representation mainly out of convenience in pro- 
gramming. Since we knew that only a handful of CLBs would be evolved, our 
bitstrings would be at most 1000 bits long. We acknowledge that this approach 
would likely suffer as more CLBs were utilized and the corresponding bitstring 
enlarged to thousands of bits. 

The representation is shown in Figure QThis scheme is comprised of multiple 
128-bit fields, one for each CLB. Within each CLB field are a number of sub- 
fields that specify each of the LUT bits and remote connections. There are 16 
bits that specify the contents of each LUT. Each LUT has four inputs, and 
since each of these inputs can be connected to other LUT outputs, the remote 
CLB/LUT requires addressing bits. Since our system will be comprised of four 
CLBs, we need only two bits to specify the remote CLB, and another two bits to 
specify the particular LUT within the CLB. This pattern of sub-fields continues 
for each LUT until all the LUTs in the CLB are accounted for. An illustration 
of the CLBs, LUTs and sample routing is shown in Figure Q 

The operators employed were crossover and mutation. Two-point crossover 
was implemented using cut points allowed between bits. Mutation was applied 
on individual bits. 

Regarding the routing, it is chosen automatically by the JBits software. Our 
circuits are sufficiently small that we have never experienced a situation where 
a route could not be found. Successful routes have been found routing 1 LUT 
output to 48 different inputs (the maximum number of inputs available in a 
2-by-2 circuit, where 1 CLB is dedicated to external inputs). It is theoretically 
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possible that in some larger designs the routing will become so dense that some 
routes would not be found. If such a case were to occur, the specific route will 
simply not get connected. Such individual would then most likely receive a low 
fitness score, and be automatically eliminated from the gene pool. 

4 Fault Recovery of Quadrature Decoder 

The quadrature decoder [0 was selected as an initial case study for testing and 
refinement of our evolutionary recovery strategy. It represents a NASA applica- 
tion of manageable size that is appropriate for tuning of the GA. Quadrature 
decoders provide a means of counting objects passed back and forth through 
two beams of light, or alternatively determining the angular displacement and 
direction of rotation of an encoder wheel turning about its axis. A quadrature 
decoder that determines the direction of rotation of a shaft is shown in Figure Q 





Fig. 3. Rotating shaft application for a quadrature decoder. 



The concept of operation for the quadrature decoder is that the objects, or 
opaque arcs on the rotating wheel, to be counted will first obscure and then 
move past the two light beams in succession. The order in which the beams are 
cleared can be used to ascertain the direction of rotation. The use of two beams 
acts to preclude false counts due to jitter or bounce resulting from multiple 
phantom reads. For example, to have a valid increment in the rotational count, 
both beams must be cleared in succession. 

To implement the encoder, it is possible to employ a state machine that keeps 
track of the beam activity. The state machine accepts two single-bit inputs which 
are asserted only when the corresponding sensor is obscured. When a change of 
the inputs occurs, the state machine transitions to its next internal state. The 
state machine is asynchronous and outputs a zero bit if the wheel is rotating in 
one direction, and a one bit if the wheel is rotating in the opposite direction. If 
the wheel is not rotating, the output is the same as it previously was. The finite 
state machine for the quadrature decoder is shown in Figure ^ 
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Fig. 4. Quadratnre decoder finite state machine. 




5 Experimental Setup and Results 

The software system used is depicted in Figure 0 The entire system is imple- 
mented in software. The GA software is ECJ, a Java-based evolutionary com- 
putation and genetic programming system by Sean Luke of George Mason Uni- 
versity. EGJ is augmented by our code for tasks like decoding individuals and 
calculating fitness. The GA sits on top of Xilinx Gorporation’s JBits software 
[EEl, a set of Java classes which provide an Application Programming Inter- 
face to access the Xilinx FPGA bitstream. Xilinx’s Virtex DS software, which 
simulates the operation of Virtex devices, is used to test candidate solutions. Bor- 
land’s JBuilder Java environment is used for development and to run the system, 
though Sun Microsystem’s Java virtual machine is used beneath JBuilder. 

To evaluate the fitness of an individual, an input stream of 500 bit pairs is 
used. These inputs attempt to fully exercise the evolving finite state machines. 
The quad decoder inputs are supplied at specified clock intervals, which have 
nothing to do with how often the finite state machine changes state. The quad 
decoder inputs can change at any frequency below the frequency of the decoder 
clock ~ they might never change (if the wheel is standing still), or they might 
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change at every clock cycle (if the wheel is rotating at the maximum allowed 
velocity) . 

The output stream consists of 510 bits sampled across all four CLBs. Ten 
bits have been added to allow for the delay (in clock cycles) between the time the 
decoder inputs are fed into the decoder, and the time the output of the decoder 
is read. Such arrangement allows us to read the output of the decoder from 1 
to 11 clock cycles after the inputs have been fed into the circuit. Adding ten 
bits gives ten output stream windows of length 500, with each output stream 
shifted by 1-bit from the next. Sampling across all the CLBs allows the GA to 
maximum flexibility in building the FSM. Thus, fitness is expressed as: 

F = max (CLB() 

i=i,4;j=o,g' 

where CLB^ represents the number of correct output bits from the Rh CLB 
shifted by j clock ticks. The fitness is simply the highest number of correct 
output bits seen across all of the CLBs and across the ten output windows. The 
best score is 500, and the worst score is 0. 

The genetic algorithm was set up as shown in Table D Small population 
sizes were necessary since an unfixablememory leak was present in one of the 
pre-compiled modules. 



Table 1. GA parameters. 



Number of generations 


1000 


Population size 


40 


Tournament Size 


4 


Elitist Individuals 


2 


Gen 0 Seeding 


20 individuals 


Grossover rate 


0.8 


Mutation rate 


0.002 per bit 



Approximately 10 experimental runs were conducted using smaller input bit- 
streams of 100 bit pairs. These were found to evolve finite state machines that 
were tuned to the test cases, but not robust when interrogated with out of sam- 
ple input test streams. Two runs were conducted using 500 bit pairs and one 
these runs was able to evolve a 100% accurate quadrature decoder finite state 
machine in the presence of an induced fault. The location of the fault was chosen 
at random, although we made sure that it would adversely affect the function- 
ality of the seeded circuits. Once the fault is present, we assume that it does 
not get removed (however, if it does, our algorithm can start evolving the circuit 
configuration again). We assumed that the circuit is operating properly prior to 
the fault, and the evolution is started once the fault is detected. 

The best evolved configuration was found in generation 623 and is shown in 
Figure Q Two of the 16 LUTs went unused which is not surprising given that 
the FSM can be implemented with about 10 LUTs. The GA exploits the induced 
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Fig. 6. Evolved configuration showing routing, LUT contents, and simulated fault. 
Inputs are on the lines labeled MSB and LSB, referring to the least/most significant bit 
of the input (channel A and B inputs). Wires that are shown crossing perpendicularly 
(eg, +) are unconnected - only wires that have T junctions are connected. 



fault to its advantage because if you remove the fault in the evolved solution, 
it no longer functions correctly - it achieves an accuracy of only 93.8%. Also, 
note that the input LUTs had mostly zeros in their tables. This is because we 
fix most of those bits to zero in the genome since they do not affect the LUT’s 
function. However, the “corner” bits of each of those input LUTs are involved 
in processing the input, and therefore, are evolved. 

The GA performance curve for this run is shown in Figure Q The run ramps 
up quickly showing that useful search is underway, however, the average fitness 
is stagnant for about 300 generations, which is not encouraging. The runs are 
quite slow to execute on a 2 GHz Pentium 4 PG. Runtimes were about 45 hours 
since each evaluation takes approximately 6 seconds. 

6 Discussion 

Evolutionary systems for fault recovery on FPGAs may be an important tool in 
the quest for ever-higher levels of fault tolerance in NASA missions and other 
applications. We have demonstrated a system that is able to evolve a realistic 
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Fig. 7. GA performance cnrve. The top curve is the best individual’s fitness at each 
generation and the bottom curve is the average fitness. 



spacecraft control function in the presence of a permanent stuck-at fault. Using 
a software simulation of an FPGA, we constructed a genetic representation that 
included both logic and routing information, and ran a genetic algorithm to 
evolve a quadrature decoder. As is typical in evolutionary algorithm applications, 
the evolved solution exploits its resources in unexpected ways. In our case, the 
algorithm made use of the fault itself in constructing its solution. If there is 
economy to be gained by exploiting damaged resources, that is certainly a benefit 
largely unique to evolutionary search. 

Potential advantages of this approach are handling a wider range of errors, 
and relaxing the requirement of fault location/isolation. An autonomous fault 
recovery system would be possible if the evolution could be done at sub-second 
speeds. Future work includes investigation of scalability to more complex logic 
functions and systems that have multiple induced faults. Speeding up the eval- 
uation cycle by doing evolution directly in hardware is our next line of research. 
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Abstract. We describe the combination of a novel, biologically plausible model 
of development with a genetic algorithm. The Evolutionary Developmental 
System is an ohject-oriented model comprising proteins, genes and cells. The 
system permits intricate genomic regulatory networks to form and can evolve 
spherical embryos constructed from balls of cells. By attempting to duplicate 
many of the intricacies of natural development, and through experiments such 
as the ones outlined here, we anticipate that we will help to discover the key 
components of development and their potential for computer science. 



1 Introduction 

Talk to any evolutionary biologist and they’ll tell you that the standard genetic algo- 
rithm (GA) does not resemble natural evolution very closely. While our GAs may 
evolve their binary genes, most biologists would be horrified to discover that concepts 
such as genotype and phenotype are so blurred in evolutionary computation that some 
researchers make no distinction between the two. Should you have the courage to go 
and talk to a developmental biologist, you’ll have an even worse ear-bashing. You’ll 
be told that development is the key to complex life. Without a developmental stage 
from genotype to phenotype, all you have is a big DNA or RNA molecule. With de- 
velopment you can have layer upon layer of complexity, from cells to organs to or- 
ganisms to societies. 

Of course our motivations in computer science are often very different from the 
motivations of biologists. Nevertheless, it has long been the goal of evolutionary 
computationists to evolve complex solutions to problems without needing to program- 
in most of the solution first. The dream of complex technology that can design itself 
requires rejection of the idea of knowledge-rich systems where human designers dic- 
tate what should and should not be possible. In their place we need systems capable of 
building up complexity from a set of low-level components. Such systems need to be 
able to learn and adapt in order to discover the most effective ways of assembling 
components into novel solutions. And this is exactly what developmental processes in 
biology do, to great effect. 

In this paper we present, for the first time, an overview of a novel biologically 
plausible model of development for evolutionary design. This system is intended to 
model biological development very closely in order to discover the key components 
of development and their potential for computer science. The paper is divided into 
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sub-sections covering different aspects of the Evolutionary Developmental System 
(EDS). It begins with an overview of the entire system, followed by sections detailing 
individual components in isolation. These individual components are then drawn 
together, and how they work as part of the overall developmental system is detailed as 
well as the role of evolution and how the genetic algorithm is wrapped around the 
developmental core. Finally we present some examples of results generated during 
on-going experiments. 



2 Background 

Development is the set of processes that lead from egg to embryo to adult. Instead of 
using a gene for a parameter value as we do in standard EC (i.e., a gene for long legs), 
natural development uses genes to define proteins. If expressed, every gene generates 
a specific protein. This protein might activate or suppress other genes, might be used 
for signalling amongst other cells, or might modify the function of the cell it lies 
within. The result is an emergent “computer program” made from dynamically form- 
ing gene regulatory networks (GRNs) that control all cell growth, position and behav- 
iour in a developing creature [Bentley, 2002]. 

The field of Computational Development has matured steadily over the past decade 
or so, with work touching upon a wide range of aspects of development ranging from 
its use for the construction of neural net robot controllers [Jakobi, 1996], to the large 
scale modelling of morphogenesis [Fleischer, 1993]. 

Recently a resurgence of interest into computational development has fuelled much 
research. Problems of scalability, adaptability and evolvability have led many re- 
searchers to attempt to include processes such as growth, morphogenesis or differen- 
tiation in their evolutionary systems [Eggenberger, 1996; Haddow et al, 2001; Bon- 
gard, 2002; Miller 2002]. For reviews see [Kodjabachian and Meyer, 1994; Kumar 
and Bentley, 2002]. 



3 The Evolutionary Developmental System (EDS) 

In nature, development begins with a single cell: the fertilised egg, or zygote. In addi- 
tion to receiving genetic material from its two parents, the zygote is seeded with a set 
of proteins - the so-called ‘maternal factors’ deposited in the egg hy the mother 
[Wolpert, 1998]. The maternal factors trigger development causing the zygote to 
cleave (fast cell division with no cell growth). After cleavage, normal cell division 
begins; as cells divide they inherit the state of their parents. To ensure the embryo is 
not homogenous, one or two asymmetric divisions occur, resulting in an unequal 
distrihution of factors to the daughter cells. In doing so, cells become different from 
one another. 

Development is controlled by our DNA. In response to proteins, genes will be ex- 
pressed or repressed, resulting in the production of more (or fewer) proteins. The 
chain-reaction of activation and suppression both within the cell and within other 
nearby cells through signaling proteins and cell receptors, causes the complex proc- 
esses of cellular differentiation, pattern formation, morphogenesis and growth. 
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The Evolutionary Developmental System is an attempt to encapsulate many of these 
processes within a computer model. At the heart of the EDS lies the developmental 
core. This implements concepts such as embryos, cells, cell cytoplasm, cell wall, 
proteins, receptors, transcription factors (TEs), genes, and cis-regulatory regions (see 
figure 1 for a graphical view of the EDS). Genes and proteins form the atomic ele- 
ments of the system. A cell stores proteins within its cytoplasm and its genome 
(which comprises rules that collectively define the developmental program) in the 
nucleus. The overall embryo is the entire collection of cells (and proteins emitted by 
them) in some final conformation attained after a period of development. A genetic 
algorithm is wrapped around the developmental core. This provides the system with 
the ability to evolve genomes for the developmental machinery to execute. 



Cell surface receptors 
proteins used as 
‘sensors’ to detect 
signalling proteins by 



Cell 

Membrane 

DNA 

Genome 
based cis - 
regulatory 
region. 
Ligands bind 
to these sites 
& trigger 
transcription 




signalling 

protein 



Fig. 1. A single cell in the Evolutionary Developmental System 



4 Components of the EDS 

The following sections describe the main components of the developmental model: 
proteins, genes and cells. 
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4.1 Proteins 

In nature, proteins are the driving force of development. They are macromolecules, 
long chains of amino acids that assemble at protein production sites known as ri- 
bosomes. The only function of genes is to specify proteins. 

The EDS captures the concept of a protein as an object. Each protein has an ID tag, 
which is simply an integer number. The EDS uses eight proteins (although number of 
proteins used is a user defined variable in the system). Protein objects contain both a 
current and a new state object (at the end of each developmental cycle all protein new 
states are swapped with current states to provide “parallel” protein behaviour). These 
protein state objects house important protein-specific information, for example, the 
protein diffusion co-efficient. 



Protein Creation, Initialisation, and Destruction. In the EDS, proteins do not exist 
in isolation; they are created and owned by cells. Thus, during protein construction 
each protein is allocated spatial co-ordinates inherited from the cell creating the 
protein. Handling protein co-ordinate initialisation using this method overcomes the 
problem of knowing which cell created which proteins. 

A protein lookup table (extracted from the genome, see next section) holds details 
about all proteins and is used to initialise each protein upon creation. It has the fol- 
lowing details for each protein: 



Rate of Synthesis 


amount by which the protein is synthesised 


Rate of Decay 


amount by which the protein decays 


Diffusion coefficient 


amount by which the protein diffuses 


Interaction strength 


strength of protein interaction, i.e., activation or inhibition 


Protein Type 


ID tag, e.g., long-range hormone, or short-range receptors 



Additionally, each protein keeps the following variables: 

Bound? whether or not a receptor protein is currently bound ^ 

Protein Source Concentration the current concentration of the protein 
Spatial coordinates the position of the source of the protein 

Protein destruction in the EDS is implemented by simply setting the protein’s 
source concentration to zero: if the concentration is zero there can be no diffusion, 
unless more of the protein is synthesised. 

Protein Diffusion. Diffusion is the process by which molecules spread or wander due 
to thermal motions [Alberts et ah, 1994]. When molecules in liquids collide, the result 
is random movement. Protein molecules are no different: they diffuse. 

The average distance that a molecule travels from its starting point is proportional 
to the square root of the time taken to do so. For example, if a molecule takes on av- 
erage 1 second to move 1 |im, it will take 4 seconds to move 2 pm, 9 seconds to move 



'The bound variable is only operational in receptor proteins. 
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3 )im, and 100 seconds to move 10 jim. Diffusion represents an efficient method for 
molecules to move short distances, but an inefficient method to move over large dis- 
tances. Generally, small molecules move faster than large molecules (Alberts et ah, 
1994). 

Protein diffusion in the EDS models this behaviour. Diffusion is implemented by 
using a Gaussian function centred on the protein source. The use of the Gaussian 
assumes proteins diffuse equally in all directions from the cell. 

In more detail: the source concentration records the amount of the current protein. 
Every iteration, its value is decremented by the corresponding ‘rate of decay’ parame- 
ter. If expressed by a gene, its value is also incremented by the corresponding ‘rate of 
synthesis’ parameter. To calculate the concentration of a protein at a distance x from 
the protein source: 

_ 2 

concentration = sxe^''^ 

Where: d is the diffusion coefficient of the current protein. 

X is distance from protein source to current point 
s is the current protein source concentration. 

Eigure 2 illustrates the way protein concentration changes according to the three 
variables: distance, diffusion coefficient and source concentration. 



Fig. 2. Plot of protein concentration against distance from source, where: d = 0.5 and i = 1.0 
(left), d = 0.5 and i = 2.0 (middle), and d= 1.5 and s = 1.0 (right). 



4.2 Genes 

The EDS employs two genomes. The first contains protein specific values (e.g., syn- 
thesis, decay, diffusion rates, see above). These are encoded as real floating-point 
numbers. The second describes the architecture of the genome to be used for devel- 
opment; it describes which proteins are to play a part in the regulation of different 
genes. It is this second genome that is employed by each cell for development; the 
information evolved on the first genome is only needed to initialise proteins with their 
respective properties. 

In Nature, genes can be viewed as comprising two main regions: the cis-regulatory 
region [Davidson, 2001] and the coding region. Cis-regulatory regions are located just 
before (upstream of) their associated coding regions and effectively serve as switches 
that integrate signals received (in the form of proteins) from both the extra-cellular 
environment and the cytoplasm. Coding regions specify a protein to be transcribed 
upon successful occupation of the cis-regulatory region by assembling transcription 
machinery. Currently, the EDS’s underlying genetic model assumes a “one gene, one 
protein” simplification rule (despite biology’s ability to construct multiple proteins); 
this aids in the analysis of resulting genetic regulatory networks. To this end, the 
activation of a single gene in the EDS results in the transcription of a single protein. 
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This is currently ensured by imposing the following structure over genes: each gene 
comprises both a cis-regulatory region and a consequent protein-coding region. 

A novel genome representation (based on eukaryotic genetics) was devised for de- 
velopment in the EDS. This genome is represented as an array of Gene objects 
(fig. 3). Genes are objects containing two members: a cis-regulatory region and a 
protein-coding region. The cis-regulatory region contains an array of TF target sites; 
these sites bind TFs in order to regulate the activity of the gene. 



Gene 1 Gene 2 Gene 3 Gene 4 




Fig. 3. An arbitrary genome created by hand. Genes consist of two objects: a cis-regulatory 
region and a coding region. Cis-regulatory regions consist of transcription factor target sites 
that bind TFs, triggering transcription of the coding region. Each number denotes a protein. 



The gene then integrates these TFs and either switches the gene ‘on’ or ‘off. Inte- 
gration is performed by summing the products of the concentration and interaction 
strength (weight) of each TF, to find the total activity of all TFs occupying a single 
gene’s cis-regulatory region: 

d 

a = ^ conc- * int eraction _ strength- 

i=i 

where: a is the total activity, i is the current TF, 

d is the total number of TF proteins visible to the current gene, 
cone, is the concentration of i at the centre of the current cell, 
interaction_strength^ is the strength of protein interaction for the current TF 
(see previous section). 

This sum provides the input to a logistic sigmoid threshold function (a hyperbolic 
tangent function), which yields a value between -1 and 1. Negative values denote 
gene repression and positive values denote gene activation: 

g(a) = tanh(fl) = ^ ^ 
e+e 

Figure 4 illustrates this sigmoid calculation used to determine whether a gene is ac- 
tivated and produces its corresponding transcription factor or not. 

4.3 Cells 

Cells can be viewed as autonomous agents. These agents have sensors in the form of 
surface receptors able to detect the presence of certain molecules within the environ- 
ment. Additionally, the cell has effectors in the form of hundreds and thousands of 
protein molecules transcribed from a single chromosome able to affect other genes in 
other cells. Cells resemble multitasking agents, able to carry out a range of behav- 
iours. For example, cells are able to multiply, differentiate, and die. 
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INPUTS 



OUTPUT 




Fig. 4. A gene showing the various positive and negative inputs received in the form of tran- 
scription factors, with their respective affinities (weights), and concentrations of 0.24, 0.87, and 
0.1 1 respectively. Internally, the gene integrates these TFs and decides whether or not to switch 
the gene ‘on’ or ‘off. TF, and TF^ are both activators, whereas TF^ is a repressor, denoted by a 

symbol. 

Like protein objects, cell objects in the EDS have two states: current and new. Dur- 
ing development, the system examines the current state of each cell, depositing the 
results of the protein interactions on the cell’s genome in that time step into the new 
state of the cell. After each developmental cycle, the current and new state of each 
cell is swapped ready for the next cycle. 

The EDS supports a range of different cell behaviours, triggered by the expression 
of certain genes. These are currently: division (when an existing cell “divides”, a new 
cell object is created and placed in a neighbouring position), differentiation (where the 
function of a cell is fixed, e.g., colour = “red” or colour = “blue”), and apoptosis (pro- 
grammed cell death). 

The EDS uses an n-ary tree data structure to store the cells of the embryo, the root 
of which is the zygote. As development proceeds, cell multiplication occurs. The 
resulting cells are stored as child nodes of the root in the tree. Proteins are stored 
within each cell. When a cell needs to examine its local environment to determine 
which signals it is receiving, it traverses the tree, checks the state of the proteins in 
each cell against its own and integrates the information. 



4.4 Evolution 

A genetic algorithm (GA) is “wrapped around” the developmental model. The GA 
represents the driving force of the system. Its main roles are to: 

1 . provide genotypes for development; 

2. provide a task or function, and hence a measure of success and failure; and 

3. search the space of genotypes that give rise to developmental programs capable of 
specifying embryos, correctly and accurately according to the task or function. 

Individuals within the population of the genetic algorithm comprise a genotype, a 
phenotype (in the form of an embryo object), and a fitness score. After the population 
is created, each individual has its fitness assessed through the process of development. 
Each individual is permitted to execute its developmental program according to the 
instructions in the genome. After development has ended a fitness score is assigned to 
the individual based upon the desired objective function. 
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The EDS uses a generational GA with tournament selection (typically using 14 of 
population size), and real coding. Crossover is applied with 100% probability. Creep 
mutation is applied with a Gaussian distribution (small changes more likely than large 
changes), with probability between 0.01 and 0.001 per gene. 




Fig. 5. Isopatial coordinates permit twelve equidistant neighbours for each cell (left) and are 
plotted using six axis (right). 




Fig. 6. Examples of proteins with their associated cell (at centre). Left: single cell emitting a 
long-range hormone-type protein. Middle: single cell emitting a short range (local) protein. 
Right: single cell emitting four proteins of various spread, reflected by the radius of each pro- 
tein sphere. 



4.5 Coordinates and Visualisation 

The underlying co-ordinate system used by the EDS is isospatial. All coordinate sys- 
tems have inherent biases towards different morphologies; the isospatial system is no 
different. However, the isospatial system bias results in what can only be described as 
more natural (biologic) morphologies than its Cartesian counterpart [Frazer, 1995]. 
Isospatial co-ordinates permit a single cell to have up to twelve equidistant neigh- 
bours defined by 6 axis (fig. 5), Cartesian co-ordinates only permit 6. 

The EDS automatically writes VRML files of developed embryos, enabling three- 
dimensional rendered cells and proteins to be visualised. Cells are represented by 
spheres of fixed radius; proteins are shown as translucent spheres of radius equal to 
the extent of their diffusion from their source cells. In order to place a cell in VRML 
its Cartesian co-ordinates need to be defined: to this end, isospatial co-ordinates are 
converted to Cartesian. Figure 6 illustrates how cells and proteins appear when ren- 
dered. 
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5 Experiments 

Because of the complexity of the system, numerous experiments can be performed to 
assess behaviour and capabilities. Here (for reasons of space) we briefly outline two: 

1 . The ability of genes and proteins to interact and form genomic regulatory networks 
within a single cell. 

2. The evolution of a 3D multi-cellular embryo with form as close to a prespecified 
shape as possible. 

5.1 Genetic Regulatory Networks 

In order to assess the natural capability of the EDS to form GRNs independently of 
evolution, genomes of five random genes were created and allowed to develop in the 
system for ten developmental steps. The cell was seeded with a random set of eight 
proteins (maternal factors). 
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Fig. 7. Gene expression patterns for a run of a randomly created genome seeded with a random 
subset of proteins. The left side shows the raw output from the system where an ‘x’ means the 
gene in that column is ‘on’ and means the gene is ‘off. The right side depicts this text pat- 
tern as a graphical output viewed as a ID CA iterated over ten time-steps. Note, gene 4, i.e., [ 0 
I 0 ] is autocatalytic. 




66 Sanjeev Kumar and Peter J. Bentley 



Figure 7 (top) shows an example of the results of this experiment. The pattern 
shows gene four exhibiting autocatalytic behaviour having initially bound to protein 
zero. (Gene four is activated when in the presence of protein zero, and produces pro- 
tein zero when activated.) 

Figure 7 (bottom) shows an example of the pattern that results when the initial ran- 
dom proteins (initial conditions) are varied very slightly, but the genome is kept con- 
stant. Again, gene four shows the same autocatalytic behaviour, but now the GRN has 
found an alternative pattern of activation. These two runs illustrate the difference the 
initial proteins can make on the resulting GRN. 



5.2 Morphogenesis: Evolving a Spherical Embryo 

In addition to GRNs, the other important capability of the EDS is cellular behaviour. 
The second experiment focuses on morphogenesis, i.e., the generation of an embryo 
with specific form, constructed through appropriate cellular division and placement, 
from an initial single zygote. For this experiment, the genetic algorithm was set up as 
described previously, with the fitness function providing selection pressure towards 
spherical embryos of radius 2 (cells have a radius of 0.5). 




Fig 8. Six random initial embryos. 

Figure 8 shows examples of the initially random embryos with their corresponding 
proteins produced by the GRNs. Figure 9 shows two examples of final “spherical” 
embryos. As well as having appropriate forms, it is clear that the use of proteins has 
been reduced by evolution. Interestingly, analysis indicates that evolution did not 
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require complex GRNs to produce such shapes. It seems likely that it is the natural 
tendency of the EDS to produce near-spherical balls of cells, hence evolution simply 
did not need to evolve intricate GRNs for this task. Further experiments to evolve 
more complex morphologies are under way. 




Fig. 9. Two “spherical” embryos. Using the equation of a sphere as a fitness function with 
sphere of radius 2.0. 



6 Summary 

The staggering complexities of nature result from a combination of evolution and 
development. This work has described the combination of a novel, biologically plau- 
sible model of development with a genetic algorithm. We have shown how an Evolu- 
tionary Developmental System can be constructed based on an object-oriented model 
of proteins, genes and cells. We have also described how this system permits intricate 
genomic regulatory networks to form and can evolve spherical embryos constructed 
from balls of cells. By attempting to duplicate many of the intricacies of natural de- 
velopment, and through experiments such as the ones outlined here, we anticipate that 
we will help to discover the key components of development and their potential for 
computer science. Further experiments and analysis are ongoing. 
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Abstract. Inspired by biological development, we wish to introduce a 
circuit-DNA that may be developed to a given circuit design organism. 
This organism is a member of a Virtual EHW FPGA species that may 
be mapped onto a physical FPGA. A rule-based circuit-DNA, as used 
herein, provides a challenge to find suitable rules for artificial develop- 
ment of such an organism. Approaching this challenge, the work herein 
may be said to be of an investigative nature, to explore for developmental 
rules for developing even the simplest organism. 

The artihcial developmental process introduced herein, uses a knowl- 
edge rich representation including both knowledge of the circuit’s (or- 
ganism’s) building blocks and local knowledge about neighbouring cells. 
Initial experiments for knowledge rich development on our virtual tech- 
nology platform, are presented. 



1 Introduction 

Generally in EHW, a one-to-one mapping i.e. a direct mapping, has been chosen 
for the genotype-phenotype transition. In a one-to-one mapping the genotype is 
a complete blueprint of the phenotype, including all information of the assembled 
phenotype. In the case of an FPGA, the genotype holds the configuration data 
for the FPGA. The larger the circuit to be evolved, the more logic and routing 
information (configuration data) is required in the genotype. As the complexity 
of the genotype increases, so increases the computational and storage require- 
ments of the evolutionary process. These high resource requirements limit the 
complexity, especially functional complexity, of circuits that can realistically be 
evolved. 

Moving away from a direct mapping to an indirect mapping we have the 
possibility of introducing a smaller genotype for a large phenotype. Turning to 
nature, we see that DNA is indirectly mapped to the developed organism. DNA 
is a building plan of how to assemble the organism rather than a blueprint of 
it. Following nature’s example, we could achieve a smaller genotype in the form 
of a building plan for a given complex circuit and use artificial development to 
develop our organism. 
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By simplifying or rather shrinking the size of the genotype we effectively 
move the complexity problem over to the genotype-phenotype mapping thus 
increasing the complexity of the mapping. In this work, artificial development 
is this mapping and, as such, we need a development process that can handle 
such a complex mapping. To simplify this problem, we have chosen to split 
the mapping into two simpler mappings by introducing a virtual technology 
between the genotype and phenotype. In addition, the development process is 
now developing towards a phenotype platform more attuned to developmental 
design than today’s technologies. 

The mapping from the virtual technology, our Virtual Evolvable Hardware 
(EHW) FPGA, to a physical FPGA is a relatively simple mapping as both 
architectures are based on FPGA principles. This mapping is described in JQ. 
The development process itself is the first stage of the mapping. Instead of 
developing to our complex organism (phenotype) we are developing to a simpler 
organism the intertype i.e. the configuration data for our Virtual EHW FPGA. 

This work is a continuation of earlier work into L-systems for development 
of electronic circuits |p. L-systems is a rule based artificial development system 
(see section 0. Designing rules that are suitable for such a purpose and that will 
in fact give rise to the desired phenotype is quite a challenge, even with the help 
of evolution. To understand why evolution has a problem meeting this challenge 
we need to look again at the genotype representation and fitness evaluation. 

The main goal, herein, is to consider the genotype representation i.e. the 
developmental rules. We wish to introduce more knowledge into developmental 
rules so as to tune development towards a given organism i.e. electronic circuit, 
on a Virtual EHW FPGA platform. 

The article is laid out as follows. Section Qintroduces the L-systems concept 
which is the basis for our developmental rules and section ^describes our tech- 
nology platform, how this platform may be seen to be ’’developmental friendly” 
and its computational and communication properties as an organism. Section ^ 
describes our reasoning for introducing knowledge into our developmental rules 
and in section 0we describe the way in which we have introduced knowledge. 
Section ^describes the experimental work of this paper. Section [^summarises 
some of the relevant work in the area of artificial development. 

2 L-Systems 

An L-system is a mathematical formalism used in the study of biological devel- 
opment. One of the main application areas is the study of plant morphology. 
An L-system is made up of an alphabet, a number of ranked rules and a start 
string or axiom. The rules are built up of two types of rules: change and growth 
rules. Ghange rules change the content of the target and growth rules expand 
the target. Applying a rule means finding targets within the search string which 
match the rule condition. This condition is a pattern on the left hand side (LHS) 
of the rule. This condition is also a string and the string is made up of elements 
from the alphabet. Firing the rule means replacing the targets, where possible, 
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with the result of the rule i.e. the right hand side (RHS) of the rule. Firing of the 
rules continues until there are no targets found for any rule or until the process 
is interrupted. 

Previous work into L-systems for circuit design |P, was a first attempt at 
using L-systems in combination with a genetic algorithm to achieve development 
on a digital platform. The rules were given no chosen meaning with respect to 
improving connectivity or logic. Also the rules were non context-sensitive i.e. did 
not rely on neighbouring Sblock information. As such, these may be said to be 
knowledge-poor |0. To begin with, the rules in the L-system experiments were 
generated randomly, evolved, fine-tuned to a certain extent through experimental 
results and then re-evolved. However, with little knowledge in the rules other 
than random strings it was difficult to interpret individual rule effects on the 
final phenotype. 

3 Decreasing Development Complexity — 

Introducing an Intertype 

The Virtual EHW FPGA contains blocks — Shlocks, laid out as a symmetric grid 
where neighbouring blocks touch one another [EJ. There is no external routing 
between these blocks except for the global clock lines. Input and output ports 
enable communication between touching neighbours. Each Sblock neighbours 
onto Sblocks on its 4 sides. 

Each Sblock consists of both a simple logic/memory component and routing 
resources. The Sblock may either be configured as a logic or memory element 
with direct connections to its 4 neighbours or it may be configured as a routing 
element to connect one or more neighbours to non-local nodes. By connecting 
together several nodes as routing elements, longer connections may be realised. 
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Fig. 1. Sblock as a Routing Module 



Figure IQiUustrates a routing Sblock (R) which takes a west input and routes 
it to the North. The logic block (OR) then reads its south input as well as its 
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north input and forwards its result to its west output. If we look at the LUT 
output for the routing module (R) we see that the output is identical to the west 
input (w). As such, the other inputs are treated as don’t cares. The LUT for the 
logic module (OR) will most likely have different input values, but for simplicity 
we consider the same LUT inputs and consider the output values for an OR 
module. The OR column shows that in this case the north and the south inputs 
have been Or-ed and the other inputs are treated as don’t cares. It is worth 
noting, however, that for clarity we have chosen to highlight only the relevant 
outputs in this figure. The single output from an Sblock is, of course, available 
in all directions. 



3.1 Virtual EHW FPGA: Development Friendly Features 

Artificial development could, as in nature, use: pattern formation, morphogene- 
sis, cell differentiation and growth as interrelated processes. Biologically we are 
looking at a cell based organism where the maximum size of our organism is the 
maximum grid size of the technology we are using. Each Sblock is a cell in the 
organism and can communicate with other cells through local communication — 
although limited to 4 neighbours due to the grid structure. An organism (circuit) 
should grow from a single cell (an Sblock) by programming a neighbour cell as 
part of the organism. Growth should only effect neighbours i.e. growth causes 
our growing organism to expand cell-wise within the grid of our technology. In 
addition, our development algorithm will need to allow for cell death, which 
would mean deactivating an Sblock by reprogramming it to the default value of 
non-organism cells. 

Cell differentiation may be achieved by reprogramming a given cell. Varying 
the structure of an sblock through differentiation would mean reprogramming the 
Sblock to connect to or disconnect to individual neighbours. Since the internal 
structure of the sblock is fixed no other structural variations are possible. Varying 
the functionality of the Sblock through differentiation involves programming 
the cell’s internal functionality i.e. its look-up table. It should be noted that 
programming the cell’s connectivity and functionality are intertwined through 
the look up table, as described earlier. 

The development algorithm should control pattern formation within the cells 
based on the cell types. However, what is a cell’s type with respect to Sblocks? If 
as in biology, the type is the same functionality e.g. a muscle cell, then perhaps 
we refer to the functionality of the given block e.g. a 3-input AND gate. The 
type will then describe how the cell functions i.e. how it reacts to various input 
signals. 

Morphogenesis may be defined in terms of the development of form and 
structure in an organism. One of the key ways that this is achieved in biology 
is through change of cell shape. If we regard single Sblocks as cells then mor- 
phogenesis will not be achieved in this way, as each cell has a fixed structure 
and thus size with respect to a given technology. However, it is possible to ab- 
stract slightly from the technology and not only view the grid as an organism 
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but perhaps view the cell itself as an expanding body, consisting of one or more 
cells. 

As such, we see features in the Virtual EHW FPGA which may be termed 
“development friendly”. However, the challenge is to find an artificial develop- 
mental algorithm which will take some form of electronic circuit representation 
and develop it to a functionally correct electronic circuit on this platform. 



3.2 Artificially Developing Sblock-Based Circuits 

Our organism, with respect to development, is the circuit design on our Virtual 
EHW FPGA described by its configuration data. So the building plan described 
by the genotype needs to build a circuit description with respect to the com- 
ponents of the virtual EHW FPGA. During the development phase, at a given 
point in time, the structure and functionality of the developing organism may 
differ from that of the final organism but the computational and communication 
properties of the units under development should be static. That is, the blue 
print includes a description of the computational and communication properties 
of the organism that will remain unchanged during the development process. 

To explain this biologically, if a given DNA can develop into a given organism 
then our building plan needs to develop to a given type of circuit. To describe 
the circuit in its loosest terms, we can describe the circuit as an interconnection 
of Sblocks, where each Sblock has certain computational and communication 
properties. As such, our building plan has to be limited to the computational 
and communication properties of our phenotype i.e. our organism. 

Gonsidering the description of the virtual EHW FPGA given, communication 
available within the organism is limited to neighbour communication with up to 
4 neighbours. Gomputationally, an Sblock can compute any logic function with 
up to 5 inputs. If the full functionality of the Sblock structure is to be used in 
the organism then the genotype should reflect this in its building plan. In the 
L-systems work described in JQ, the building plan enabled communication with 
4 neighbours and any type of Sblock i.e. any 32 bit configuration. 



4 Introducing Knowledge to Developmental Rules 

In earlier work, lack of knowledge within the rules meant that it was hard to gain 
any realistic understanding of good or bad rules with respect to development of a 
given organism. In this section, the introduction of knowledge into developmental 
rules is considered so as to aid our understanding of these rules. 



4.1 Genotype Representation 

In the work presented in JP, evolution was used to try to refine the developmental 
rules. However, this proved to be quite a challenge for evolution. One of the key 
problems lay in the genotype representation. 
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Fig. 2. Genotype-Phenotype mapping and their Respective Solution Space 



Figure ^provides an example of an indirect mapping where a 128 bit geno- 
type is developed to a 3200 bit phenotype. A 128 bit genotype can search a 
solution space of 2^^® different phenotypes of length 128 bit, if we assume a 
direct mapping. If we assume a deterministic development mapping, as shown, 
then we still search 2^^® different phenotypes but this time of length 3200 bit as 
development has extended the phenotype to 3200 bit. As such, the search space 
for the 128 bit genotype is only a very small part of the phenotype search space. 

How do we represent our 128 bit genotype such that it may be developed 
to a 3200 bit phenotype? Using development, the genotype is a building plan 
of how to build the functionality and structure that we wish to achieve in the 
phenotype. However, how do we express this? 

In the work presented in [Q, an axiom and a set of change and growth rules 
was the genotype and both the axiom and the rules themselves were based on 
random strings. The genotype was 440 bits long and the phenotype, a 16x16 
grid of Sblock routers was therefore 16x16x32 bits i.e. 8192 bits. To a certain 
extent the communication and computational properties of the phenotype were 
built into the building plan. Growth was restricted to complete Sblocks i.e. one 
complete 32 bit cell at a time and a given cell was allowed to communicate 
with up to 4 neighbours. Also the computation properties of the Sblock grid 
were retained since a random 32 bit string could produce any type of Sblock. 
However, the building plan did not direct the developing genotype to the actual 
phenotype required i.e. routing Sblocks in the experiments presented. Of the 2®^ 
possible configurations of an Sblock only 4 configurations describe routing blocks. 
As such, development was guiding us towards any phenotype of the set of the 
phenotypes available on a 16x16 grid of Sblocks. We were then using evolution 
to try to refine the rules to rules that might develop to the given phenotype. 

4.2 Introducing Restricted Sblock Types 

In our former L-system implementation, no knowledge as to the circuit goal 
was built into our rules, only knowledge about the virtual technology platform. 
However, if more knowledge is built into the rules of the genotype, one might 
expect that we achieve a more focussed search for the phenotype sought. In 
addition, more knowledge in the rules will allow us to study the development 
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process itself to understand the workings of development somewhat better. That 
is, instead of leaving evolution to wildly explore for solutions in a very large 
search space, through a better understanding of development, evolution can then 
be used to refine the rules through exploitation. 

We, therefore, need a building plan that has knowledge as to the actual 
phenotype to be built. That is, instead of the rules being based on random 
strings, more specific strings should be included in the building plan. This may 
be seen as a form of Sblock type reduction where the variety of Sblock types 
allowed for within the building plan is restricted, from the 2^^ types available 
in our former work, to a subset of these types attuned to the organism to be 
grown. That is, an Sblock may be restricted to one of the following types: a 
3-input NAND gate; a 2-input NOR gate: a west routing Sblock and a North 
routing Sblock. That is only 4 types are allowed. This means that both the axiom 
and the rules in the genotype are limited to strings of these types. Also that the 
communication and computation model of the developed organism is restricted 
to that attainable using these types. 

4.3 Introducing Local Knowledge to the Development Process 

The communication and computational model for the Virtual EHW FPGA are 
intertwined. A description of the computation of an Sblock i.e. its configuration 
includes communication with its neighbours. That is its configuration describes 
its functionality in terms of which neighbours it reacts with and what function 
is computed. As such, an Sblocks functionality is based on local knowledge i.e. 
its state and the state of its connected neighbours under a given configuration. 

A further improvement on the development process may be achieved by build- 
ing local knowledge into the development process itself. To include this feature 
of natural development what we are looking for is that a cell in the developing 
organism is not just developing through a set of rules applying to its own state, 
but to the state of its neighbours as well i.e. induction. In the case of development 
towards the Virtual EHW FPGA, there are 4 neighbours and therefore induction 
will be similar to that seen in cellular automata. Another possibility is to free 
the neighbourhood from the restriction of 4 neighbours but update a given cell’s 
state based on a dynamic neighbourhood where the number of neighbours varies 
over time [^. This is of course not attuned to today’s technology but may be an 
interesting approach to development of circuits on a more futuristic technology. 



5 Knowledge Rich Development 

We wish to adapt our rule system for the experiments herein to a more knowl- 
edge rich system in two ways. The first being the introduction of limited Sblock 
typing to guide development towards a given phenotype and the second being 
to introduce neighbour information to the rules to add local knowledge to the 
development process. Further, a feature of biological development not present in 
our previous L-system work is introduced: that of cell death. 
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In the experiments herein, we limit our Sblock types to one of 5 types. These 
are don’t care DC, empty Z, north router NR, west router WR and XOR 
function XOR. That is, the organisms (circuits) that may be developed are 
limited to those including North or west routers and XOR gates. Empty Sblocks 
signify those that are not configured as part of the circuit. They may also be 
Sblocks that during the development process have been configured and then 
killed i.e. empty. Don’t care types are used to provide more flexibility to the 
rules. They may be used in the condition for a rule (LHS). However, they are not 
permitted as the result of a rule so as to retain determinism in the development 
process. 

Our rules are based on two types of rules (as in L-systems) i.e. change and 
growth rules. These rules are, of course, restricted to expressions consisting of 
up to the 5 Sblock types. In addition, they are context sensitive, relying on 
neighbouring information. As such, both rule types are expressed in terms of 
the type and state of a given Sblock and those of its 4 neighbours to the West, 
North, East and South. It should be noted that the state of a given Sblock is its 
output value i.e. 0 or 1. 



Applying a change to C Applying growth to C 




Fig. 3. Applying Change and Growth Rules 



Figure ^illustrates the workings of the change and growth rules. As shown, 
each of the neighbouring Sblocks (W, N, E and S) and the target Sblock itself 
(C) have a type and state. The result of the rule is the effect on the target 
Sblock. As shown, this effect is described in terms of the change of type, for 
the change rule and a copying of the type to a new Sblock for the growth rule. 
It should be noted that the resulting state of the target, and its copy where 
relevant, are not given in the rule. The resulting states are calculated during the 
development process. 

When a growth rule triggers i.e. it finds a match for its LHS in the Sblock 
array, it grows a new Sblock that is a copy of the target Sblock i.e. C. As this 
is growth, we need to expand the organism i.e. we need to configure an empty 
Sblock. Growth is directed to the direction of most empty Sblocks. Where there 
is an equal amount of empty blocks in any direction, growth is prioritised in the 
following order: W - N - E - S. Change rules on the other hand only effect the 
target Sblock. 
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Fig. 4. Change and Growth on an Sblock Grid 



Figure illustrates the process of applying change and growth rules. A change 
rule has targeted C. This means that the condition of this change rule, including 
types and states of the neighbours and the target match. The change rule result 
is that the target Sblock C is changed from a north router Sblock to an XOR 
Sblock. A growth rule then targets the new C. Again its condition matches the 
types and states of the neighbouring blocks and the new type and state of the 
target C. The growth rule is applied by copying the target Sblock into the empty 
Sblock to the west. 

6 Experiments and Results 

The purpose of these experiments was to investigate if the added knowledge 
built into the rules could develop to specific although simple organisms. As 
such, these experiments may be said to be of a more preliminary nature. The 
experimental platform is a software simulation of a 3x3 Virtual EHW FPGA for 
growth and differentiation and a 4x4 Virtual EHW FPGA for pattern formation. 
The genome was chosen to include an axiom — the type and state of an Sblock, 
together with 3 rules. A rule may be a change or a growth rule and of the format 
given in section 0 

First, a set R of all possible rules was generated. That is all possible combi- 
nations of type and state for the rule elements: C, W, N, E, S and the result. 
From this set R, all possible 3 rule combinations were extracted to provide our 
rule combinations for our set of genomes. Each genome was then developed for 
6 development steps. At each step, up to 3 rules may be active. The resulting 
organism and its genome was stored for each development run. The colour codes 
for the different Sblocks is given in Figure 0 



Achieving Growth 

Our first investigation was to find out if rule combinations could be found that 
indicated reasonable growth. Reasonable growth was expressed as growing to an 
organism where more than 4 Sblocks were configured i.e. non empty, within the 6 
development steps. This result was not hard to find since many rule combinations 
could in fact lead to growth across the complete grid. Further, we wanted to see 
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Result C S E N W 
RuleOne: G XOR DC Z Z DC 

RuleTwo: NR DC DC NR NR Z 

RuleThree: WR XOR NR DC DC NR 




Initial condition Developmental step 1 Developmental step 2 Developmental step 3 Developmental step 4 



Fig. 5. Development Focused on Growth, XOR axiom shown in center in the initial 
condition 



O : Empty 
Sblock 



: North 



Result C S E N W 
RuleOne: WR XOR DC XOR DC Z 

RuleTwo: NR DC DC Z DC WR 

RuleThree: G XOR Z DC DC DC 



Router 



L : West 
' Router 




Axioi 




fSWj 
Initial condition 




Developmental step 3 Developmental step 7 



Fig. 6. Development Focused on Differentiation 



if we could find rules which would grow in a specific way i.e. grow a specific type 
of Sblock. 

In Figure 0 a genotype consisting of three rules and the Sblock array, is 
shown, with an XOR axiom in the centre. The development of the genome is 
shown for the first four development steps. After step four no more rules are 
activated so the growth stops. The shown rule set illustrates a rule combination 
where the type of the axiom itself is grown. That is, it is possible to find a rule 
combination where a more specific growth is achieved. 



Achieving Differentiation 

To investigate differentiation (change) we again began with a more general goal. 
That is, to find developed organisms where more than 4 Sblocks were config- 
ured. However since change rules may not be applied to empty cells, more than 
4 configured cells says more about growth than specialisation. However, since 
growth only copies the type of the mother cell, then lack of specialisation would 
mean that all cells configured would hold the type of the mother cell. As such, 
our criteria for specialisation was set to finding genomes where both more than 4 
cells were configured and there existed 3 cell types in addition to possible empty 
types. Figure Qillustrates a rule combination together with an XOR axiom which 
achieves this criteria. In other words we can achieve differentiation. 
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Result C S E N W 

O : Empty RuleOne: G XOR DC DC DC DC 

Sblock RuleTwo: NR DC XOR DC XOR DC 

RuleThree: Z NR DC NR DC NR 




a: Repeated pattern 
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b: Symetrical pattern 



Fig. 7. Development Focused on pattern formation 



Achieving Pattern Formation 

Having found rules able to grow and specialise, we want to use thise properties 
towards development of an organism consisting of patterns of connected Sblocks. 
Pattern may be expressed as collections of Sblocks with given qualities. The 
two qualities we are looking for in these experiments are a repeated pattern 
and symmetry. Developing repeated patterns in an organism may be related to 
connecting logic gates to obtain a certain logical function. This connection of 
gates, i.e. Sblocks, may be reused in circuit design as a component. The ability 
to divide an organism in two or more symmetrical patterns may be related to 
partitioning of recourses in a circuit design into regions. 

In Figure 0a, the shown rule set develops from an XOR axiom to an organism 
consisting of a pattern repeated four times. Each pattern consists of one XOR 
and two North Routers. The organism shown uses fourteen development steps to 
completion. One of the four patterns is marked together with the placement of 
the axiom. The results is found by storing candidate rules able to make repeated 
2x2 structures of Sblocks in the 4x4 Sblock array. 

The organism in Figure 06 is developed from an XOR axiom to an symmet- 
rical organism, the lower and upper half of the organism is symmetrical. The 
development of this organism took eight development steps. The symmetrical 
sub organism is marked together with the placement of the axiom. The shown 
organism is found by storing rules producing symetrical 4x2 organisms. 



7 Related Work 

The work carried out by Sipper on Quasi-Uniform Cellular Automata [0 ad- 
dresses the issue of universal computation, the rules in this work may be related 
to possible Sblocks in our development rules. Inspired by biological development 
before differentiation, a new family of fault tolerant FPGAs are being developed 
at York |0. Cellular encoding [0 is a well-defined formal approach based on 
biological development. This methodology is a variation of genetic programming 
[0. Similar to L-systems, a grammar-based approach may be used to develop 
neural network topologies. The process is used in experiments to generate large 
repetitive structural systems [ 0 . 
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Abstract The fractal protein is a new concept for improving evolvability, seal- 
ability, exploitability and providing a rich medium for evolution. Here the idea 
of fractal proteins is introduced, and a series of experiments showing how evo- 
lution can design and exploit them within gene regulatory networks is de- 
scribed. 



1 Introduction 

Evolutionary computation does it wrong. 

That’s not to say that an evolutionary algorithm doesn’t evolve solutions. Using 
principles of selection, reproduction with inheritance, and variability, we evolve good 
solutions to many problems all the time. The trouble is, to make an evolutionary algo- 
rithm evolve, you often need to be trained in the black arts of genetic representation 
wrangling and fitness function fiddling. 

Along the way you’ll get a few glimpses of the creativity of evolution through the 
bugs in your code: the little loopholes that are ruthlessly exploited by evolution to 
produce unwanted and invalid solutions. Music with notes inaudible to the human ear, 
designs with components floating in mid-air, fraud detectors that detect everything 
and yet find nothing useful, virtual creatures that sidestep inertia and flick themselves 
along in bizarre ways. Each result fascinating, and each prevented by the addition of 
another constraint by the developer. The bugs are never reported in any publication, 
and yet they point to the true capabilities of evolution. 

These capabilities become even more apparent as evolution is embodied in physi- 
cal media. Researchers such as Thompson [Woolf and Thompson, 2002] or Linden 
[2002] have shown how exotic designs of electronic circuits or antenna that exploit 
the physical properties of different media can be readily evolved. Such solutions 
could equally have been called bugs, for the unconventional nature of their designs 
made them difficult to use or even understand. But in truth, these results throw light 
on the nature of evolution. 

Evolution is not an optimiser. It is a satisficer, exploiting every conceivable (and 
inconceivable) way to make the minimum necessary adaptation. Evolution’s greatest 
and unrivalled skill is its ability to find and use every avenue to achieve almost noth- 
ing. In nature it is able to do this trick often enough to generate ingenuity and com- 
plexity beyond our comprehension. 

In nature it can achieve this. But in our computers, suppressed by constraints, lim- 
ited by stifling representations and severe operators, evolution can barely move 
through a search space. And even when evolution is able to evolve effectively, the 
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problem it solves is usually so simplified and isolated from the true real-world prob- 
lem that the evolved solution is irrelevant. 

Work at UCL on a new project called MOBIUS (modelling biology using smart 
materials) is focusing on these issues. The aims of this research are to free evolution 
from the traditional constraints imposed by evolutionary computation (EC). Smart 
materials will be used to provide a rich environment in which evolution can be em- 
bodied [Quick et al, 1999], enabling the intrinsic evolution of potentially unconven- 
tional and diverse morphologies. Evolution will employ new genetic representations 
based on natural developmental processes, designed to be evolvable, scalable and free 
of constraints. This paper describes the initial research on this topic: how evolved 
genes can be expressed into fractal proteins that form themselves into complex and 
desirable gene regulatory networks. 



2 Background 

2.1 Development 

Development is the set of processes that lead from egg to embryo to adult. Instead of 
using a gene for a parameter value as we do in standard EC (i.e., a gene for long legs), 
natural development uses genes to define proteins. If expressed, every gene generates 
a specific protein. This protein might activate or suppress other genes, might be used 
for signalling amongst other cells, or might modify the function of the cell it lies 
within. The result is an emergent “computer program” made from dynamically form- 
ing gene regulatory networks (GRNs) that control all cell growth, position and behav- 
iour in a developing creature. (An introduction to biological development can be 
found in [Bentley 2002].) 

There is currently much research being performed on computational development. 
Problems of scalability, adaptability and evolvability have led many researchers to 
attempt to include processes such as growth, morphogenesis or differentiation in their 
evolutionary systems [Haddow et al, 2001; Miller 2002]. For recent reviews see 
[Kumar and Bentley, 2002] and [Gordon and Bentley, 2002]. 

Research at UCL has been focusing on development for some years. Kumar has 
developed extensive modelling software capable of evolving genes that are expressed 
into proteins, forming GRNs that control the expression of the genes and the devel- 
opment (growth, placement, function) of cells [Kumar and Bentley, 2003]. Gordon 
has been investigating the use of development-inspired evolutionary systems for ev- 
olvable hardware, which makes use of ideas such as differentiation and growth 
[Gordon and Bentley, 2002]. 

The research described here complements these systems, by focusing on methods 
to enrich the genetic space in which evolution searches. Specifically, this paper intro- 
duces the novel idea of fractal proteins. Here, a gene defines a protein constructed 
from a subset of the Mandelbrot Set. 



2.2 Mandelbrot Set 

Given the equation = x/ H- c where x, and c are imaginary numbers, Benoit Man- 
delbrot wanted to know which values of c would make the length of the imaginary 
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number stored in x, stop growing when the equation was applied for an infinite num- 
ber of times. He discovered that if the length ever went above 2, then it was un- 
bounded - it would grow forever. But for the right imaginary values of c, sometimes 
the result would simply oscillate between different lengths less than 2. 

Mandelbrot used his computer to apply the equation many times for different val- 
ues of c. For each value of c, the computer would stop early if the length of the imagi- 
nary number in x, was 2 or more. If the computer hadn’t stopped early for that value 
of c, a black dot was drawn. The dot was placed at coordinate (m, n) using the 
numbers from the value of c: (m + ni) where m was varied from -2.4 to 1 .34 and n 
was varied from 1.4 to -1.4, to fill the computer screen. The result was the infinite 
complexity of the “squashed bug” shape we know so well today. [Mandelbrot, 1982] 



3 Fractal Proteins 

One of the factors that limits evolution is its genetic representation. Previous work has 
shown how component-based representations (knowledge-poor representations with 
few constraints) enable greater diversity and creativity by evolution [Bentley, 2000]. 
Here, this idea has been combined 
with ideas of development and frac- 
tals, to produce a representation with 
as few constraints and as much rich- 
ness as possible. 

3.1 Representation 

Currently in this representation, there 
exist: 

- fractal proteins, defined as subsets 
of the Mandelbrot set. 

- environment, which can contain 
one or more fractal proteins, and 
one or more cells. 

- cell, which contains a genome and 
cytoplasm, and which has some 
behaviours. 

- cytoplasm, which can contain one or more fractal proteins. 

- genome, which comprises structural genes and regulatory genes. In this work, the 
structural genes are divided into different types: a cell receptor gene and one or 
more behavioural genes. 

- cell receptor gene, a structural gene which acts like a mask, permitting variable 
portions of the environmental proteins to enter the corresponding cell cytoplasm. 

- behavioural gene, a structural gene comprising operator (or promoter) region and a 
cellular behaviour region. 

- regulatory gene comprising operator (or promoter) region and coding region. 
Figure 1 illustrates the representation. 




Fig. 1. Representation using fractal proteins. 
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3.2 Defining a Fractal Protein 

In more detail, a fractal protein is a finite 
square subset of the Mandelbrot set, defined 
by three codons (x,y,z) that form the coding 
region of a gene in the genome of a cell. 
The gene is expressed as a protein by calcu- 
lating the square fractal subset with centre 
coordinates (x,y) and sides of length z, see 
fig. 2 for an example. In this way, it is pos- 
sible to achieve as much complexity (or 
more) compared to protein folding in na- 
ture. 




Fig. 2. Example of a fractal protein 
defined by (x = 0.132541887, y = 
0.698126164, z = 0.468306528) 



3.3 Fractal Chemistry 

Cell cytoplasms and the environment usu- 
ally contain more than one fractal protein. 
In an attempt to harness the complexity 
available from these fractals, multiple pro- 
teins are merged. The result is a product of 
their own “fractal chemistry” which natu- 
rally emerges through the fractal interac- 
tions. 

Fractal proteins are merged (for each 
point sampled) by iterating through the 
fractal equation of all proteins in “parallel”, 
and stopping as soon as the length of any is 
unbounded (i.e. greater than 2). Intuitively, 
this results in black regions being treated as 
though they are transparent, and paler re- 
gions “winning” over darker regions. See 
fig 3 for an example. 




Fig. 3. Two fractal proteins (top) and the 
resulting merged fractal protein combi- 
nation (bottom). 



3.4 Genes 

The cell receptor gene, regulatory, and behavioural genes all contain 7 real-coded 
values (codons): 



xp 


yp 


zp 


thresh 


X 


y 


Z 



where (xp,yp,zp,thresh) defines the promoter (operator or precondition) for the gene 
and (x,y,z) defines the coding region of the gene. When thresh is a positive value, one 
or more proteins must match the promoter shape defined by (xp,yp,zp) with a differ- 
ence equal to or lower than thresh for the gene to be activated. When thresh is a nega- 
tive value, one or more proteins must match the promoter shape defined by {xp,yp,zp) 
with a difference equal to or lower than \thresM for the gene to be repressed (not acti- 
vated). 
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To calculate whether a gene should be activated, all fractal proteins in the cell cy- 
toplasm are merged (including the masked environmental proteins, see below) and the 
combined fractal mixture is compared to the promoter region of the gene. 

The similarity between two fractal proteins (or a fractal protein and a merged frac- 
tal protein combination) is calculated by sampling a series of points in each and sum- 
ming the difference between all the resulting values. (Black regions of fractals are 
ignored.) Given the similarity matching score between cell cytoplasm fractals and 
gene promoter, the activation probability of a gene is given by: 



activationprob = (1 -i- tanh({matchnum — thresh - Ct) / Cs)) / 2 
where: matchnum is the matching score, 

thresh is the matching threshold from the gene promoter 
Ct is a threshold constant (normally set to 50) 

Cs is a sharpness constant (normally set to 50) 



Regulatory gene. Should a regulatory 
gene be activated by other protein(s) in the 
cytoplasm matching its promoter region, 
its corresponding coding region (x,y,z) is 
expressed (by calculating the subset of the 
Mandelbrot set). This is added to the cell 
cytoplasm (at the end of the developmental 
cycle), which holds all current proteins. 

Cell receptor gene. At present, the prom- 
oter region of the cell receptor gene is 
ignored, and this gene is always activated. 
As usual, the corresponding coding region 
(x,y,z) is expressed by calculating the 
subset of the Mandelbrot set. However, the 
resultant fractal protein is treated as a mask 
for the environmental proteins, where all 
black regions of the mask are treated as 
opaque, and all other regions treated as 
transparent, see fig. 4. 




Fig. 4. Cell receptor protein (top left), 
environment protein (top right), resulting 
masked protein to be combined with cyto- 
plasm (bottom). 



Behavioural Gene. Should a behavioural gene be activated by other protein(s) in the 
cytoplasm matching its promoter region, then its cell behavioural region (x,y,z) is 
expressed. These three real values are decoded to specify a range of different cellular 
functions, depending on the application. Interestingly, experiments indicate that the 
matching score of the promoter region should not be taken into account when decod- 
ing the output behaviours. (If it is, evolution designs generalized proteins that “cheat” 
and match too many different gene promoter regions). 



3.5 Fractal Sampling 

Fractal proteins are normally stored in memory as lists of constructor (x,y,z) values 
rather than as bitmaps. All fractal calculations (masking, merging, comparisons) are 
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performed at the same time, by sampling the fractals at a resolution of 15x15 points. 
During a comparison, where differences between fractals are found to be larger than a 
preset threshold, the fractals are sub-sampled in a higher resolution in those regions. 
Figure 5 illustrates the calculations made by the computer. 

3.6 Development 

An individual begins life as a single cell in a 
given environment. To develop the individual 
from this zygote into the final phenotype, frac- 
tal proteins are iteratively calculated and 
matched against all genes of the genome. 

Should any genes be activated, the result of 
their activation (be it a new protein, receptor or 
cellular behaviour) is generated at the end of 
the current cycle. Development continues for d 
cycles, where d is dependent on the problem. 

Note that if one of the cellular behaviours 
includes the creation of new cells, then devel- 
opment will iterate through all genes of the 
genome in all cells. 

3.7 Evolution 

The genetic algorithm used in this work has been used extensively elsewhere for other 
applications (including GADES [Bentley 1999]). A dual population structure is em- 
ployed, where child solutions are maintained and evaluated, and then inserted into a 
larger adult population, replacing the least fit. The fittest n are randomly picked as 
parents from the adult population. The degree of negative selection pressure can be 
controlled by modifying the relative sizes of the two populations. Likewise the degree 
of positive selection pressure is set by varying n. When child and adult population 
sizes are equal, the algorithm resembles a canonical GA. When the child population 
size is reduced, the algorithm resembles a steady-state GA. Typically the child popu- 
lation size is set to 80% of the adult size and n = 40%. (For further details of this GA, 
refer to [Bentley 1999].) 

Unless specified, alleles are initialised randomly, with (xp,yp,zp) and (x,y,z) values 
between -1.0 and 1.0 and thresh between -10000 and 10000. The ranges and preci- 
sion of the alleles are limited only by the storage capacity of double and long ‘C’ data 
types - no range constraints were set in the code. Currently genomes of a fixed num- 
ber of genes are used. 

Genetic Operators. Genes are real-coded, so the crossover operator simply performs 
uniform crossover, randomly picking alleles from the two parents to construct the 
child genome. Mutation is more interesting, particularly since these genes actually 
code for proteins in this system. There are three main types of mutation used here: 

1. Creep mutation, where (xp,yp,zp) and (x,y,z) values are incremented or decre- 
mented by a random number between 0 and 0.5 and thresh is incremented or dec- 
remented by a random number between 0 and 16384. 




Fig. 5. Comparing fractal proteins 
(top) - what the computer “sees” 
(bottom). 
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2. Duplication mutation, where a (xp,yp,zp) or (x,y,z) region of one gene randomly 
replaces a ixp,yp,zp) or (x,y,z) of another gene. (This permits evolution to create 
matching promoter and coding regions of different genes quickly.) 

3. Sign flip mutation, where the sign of thresh is reversed. 

Crossover is always applied; all mutations occur with probability 0.01 per gene. 



3.8 Discussion 

The key idea behind this representation is the concept of the fractal protein. Express- 
ing a simple gene as a highly complex fractal form captures some of the richness and 
diversity seen in natural protein folding. By allowing these fractals to combine and 
interact with each other, a fractal chemistry as complex as natural chemistry is 
formed. And because the space of fractal subsets defined by genes is coherent (similar 
fractals are near to each other), redundant (similar fractals are found in many different 
places), and infinite (limited only by computer memory), evolvability and exploitabil- 
ity of this representation should be impressive. 

Interestingly, the representation flies in the face of conventional wisdom: rather 
than make the search space as small as possible, the space is made as close to infinite 
as computers can manage (using the data types described). But despite such vast ge- 
netic spaces, it will be shown that evolution is able to find good solutions without 
difficulty - proving that evolvability, not size of search space is critical in evolution. 



4 Evolving Distinct Fractal Proteins 

Before attempting to evolve genes that can regulate behaviour via protein interactions, 
the first experiments simply checked to see if sensible fractal proteins could be 
evolved at all. In a vast search space, (although highly ordered) it was a vital test for 
evolution. For this experiment, no developmental cycle was used (i.e., only the coding 
regions of regulatory genes were evolved; promoters, receptor genes, and the envi- 
ronment were not used). A fitness function was created which produced a selection 
pressure towards the creation of 10 distinct and identifiable (amongst others) fractal 
proteins. In more detail, it maximised: 

1. the differences between each protein and the other 9. 

2. the differences between all 10 merged proteins and all 9 merged proteins (exclud- 
ing each protein in turn).* 

The goal of this experiment was to generate 10 fractal proteins that could be 
merged together and still have part of every protein identifiable in the mixture. Small 
population sizes of 20 (child) and 25 (adult) were used, for a maximum of 100 gen- 
erations. 20 runs were performed. 



* This fitness function took a little while to develop. Initial functions that simply maximized the 
difference between each protein and the other 9 merged proteins, resulted in a little bit of 
creativity by evolution. Without fail, a single white protein would evolve, with the other 9 
being mostly identical fractals. When merged, the white protein dominates or “sits on top” of 
all the others. As long as none of the other 9 were white, they would be equally different 
from the merged group, whether identical to each other or not. 
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4.1 Results 

Figure 6 shows an example of 10 fractals evolved in one of the runs. Performance was 
consistently good, and it is clear that despite the somewhat non-traditional nature of 
the search space, evolution has successfully found 10 distinct and different fractal 
subsets. Indeed, although results varied, at no time did evolution appear to become 
“stuck” - a healthy gradual change was always observed. And with an infinite amount 
of fine-tuning possible to the fractals, this seems very satisfactory. 

Figure 6 (right) shows the result of merging all 10 proteins into one. Although dif- 
ficult to see, it is possible to trace distinct portions of all ten in the merged product. 
Evolution has produced a complicated jigsaw of interlocking fractal proteins, as de- 
sired. 



5 Evolving Gene Regulatory Networks 

5.1 Experiment 1 

Having shown that evolution can create and modify fractal proteins without difficulty, 
the next step was to attempt the evolution of gene regulatory networks. A single cell 
with 4 regulatory genes and 2 behavioural genes was used. The genes were randomly 
initialised with the alleles that defined the protein fractals shown in fig. 6. The envi- 
ronment and cell receptor gene were not used. Five developmental steps were em- 
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ployed, and the system ran for 25 generations. All other parameters were as described 
above. A simple fitness function was used: it measured the output from the behav- 
ioural genes at every devel- 
opmental step, and calculated 
how much this output devi- 
ated from a predefined pat- 
tern, see table 1 . 



5.2 Results 1 



Although other representa- 
tions had struggled to achieve 
any desired GRN pattern, in every run performed for this experiment, the desired 
pattern was found within 5 generations. Figure 7 shows an example of one typical 
solution. 

As should be clear, all four regula- 
tory genes have evolved to become 
active in the absence of proteins in the 
cytoplasm. Since development begins 
with no proteins (the environment was 
not used), all four become active on 
the first step and produce proteins. In 
the second development step, the pres- 
ence of proteins regulates the four 
regulatory genes and they become 
inactive. In the third step, the absence 
of proteins makes them active again, 
and so on. To create the desired pat- 
tern, the first behavioural gene also 
evolved to become active in the ab- 
sence of proteins, and so oscillates 
with the regulatory genes. The second 
behavioural gene evolved to become 
active in the presence of protein(s) 
produced by the regulatory genes, and 
so oscillates against them. 

This is clearly a very simple GRN that only required one regulatory gene, but it 
demonstrates an important feature: the evolution of a desired pattern of gene activa- 
tion during development. Also note how the behaviour of all four regulatory genes 
has converged (something that consistently occurred in all runs, despite early solu- 
tions containing some very intricate GRN patterns). Again, such behaviour makes 
sense: the best way to produce the desired output pattern in a robust and stable way is 
to use all four regulatory genes to reinforce the underlying “clock signal”. It is a 
lovely demonstration of evolution building redundancy and graceful degradation into 
the solution. The desired output is generated even if one or two of the regulatory 
genes mutate. 




Fig. 7. Simple evolved GRN. A high indi- 
cates a gene active because it has not been 
repressed. A high ‘-t-’ indicates a gene acti- 
vated by a match to its promoter. A low indi- 
cates the gene is inactive. 



Table 1. Desired output pattern to be created by GRN m 
developing phenotype 



Developmental 


Behavioural 


Behavioural 


Step 


gene 1 


gene 2 


1 


Activated 


Not activated 


2 


Not activated 


Activated 


3 


Activated 


Not activated 


4 


Not activated 


Activated 


5 


Activated 


Not activated 
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5.3 Experiment 2 



Table 2. Desired output pattern to be created by GRN in 
developing phenotype 



With evolution clearly able to 
find a simple GRN pattern, a 
second experiment was per- 
formed, with a rather more 
complex pattern set as the 
target, see table 2. Instead of 
requiring a simple oscillator 
from each behavioural gene, 
this target required ‘B gene 
r to be active on step 1 and 5, but 
inactive on steps 2, 3 and 4, and the 
opposite from ‘B gene 2’ . Effectively, 
the target requires a GRN to count to 
3 in order to construct such a pattern. 
All other system settings were as 
described for the first experiment. 



5.4 Results 2 



The results were unexpectedly good. 

In 13 out of 20 runs, the solutions 
were two gene activations away from 
the perfect solution. In 1 run the solu- 
tion was a single gene activation away 
from the perfect solution. Perfect 
solutions were found in 5 out of 20 
runs. Given that very small population 
sizes were used and the GA only ran 
for 25 generations, these results are 
considered impressive. Figures 8 and 
9 show two examples of a perfect 
solution. 

In the first example (fig. 8) the de- 
sired activation patterns from the two 
behavioural genes were produced by 
an ingenious gene regulatory network 
that appears to act like a counter. 

Regulatory gene 4 triggers regulatory 
gene 1 , which triggers regulatory gene 
3. In the second example (fig 9), the desired patterns appear to result from an evolved 
“clock signal” provided by regulatory genes 3 and 4. In both examples, if the GRN is 
permitted to run for more time steps, the same behavioural pattern is repeated indefi- 
nitely. Also in both cases, the regulatory genes seem to act like a hidden layer in a 
neural network, with the behavioural genes somehow “sitting on top” of many diverse 
patterns and choosing to exploit or extract only those patterns that result in the desired 




Fig. 8. Evolved GRN showing counting behav- 
iour using regulatory genes 1, 3 and 4. 




Fig. 9. Evolved GRN showing counting behav- 
iour derived from an evolved “clock signal” 
using regulatory genes 1, 2 and 3, 4. 



Developmental 


Behavioural 


Behavioural 


Step 


gene 1 


gene 2 


1 


Activated 


Not activated 


2 


Not activated 


Activated 


3 


Not Activated 


Activated 


4 


Not activated 


Activated 


5 


Activated 


Not activated 
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output behaviour. Many other examples of GRNs were also observed, each using 
different ‘ideas’ to achieve the same results. 



6 Conclusions 

Ideas of evolvability, scalability, exploitability and embodiment in a rich media have 
led to the development of a novel representation based on fractal proteins. This paper 
has introduced the concept of fractal proteins and shown that genes defining these 
complex forms can be evolved effectively. Two further experiments demonstrated 
that evolution can create regulatory genes that harness the complexity of fractal pro- 
teins. These regulatory genes were used to form intricate gene regulatory networks, 
that output desired behavioural patterns. 

The work described here is only a few weeks old. There remains considerable re- 
search to investigate the evolution of more complex GRNs, the effects of environment 
and cell receptor genes, cellular behaviours and eventually the evolution of multicel- 
lular forms. This research will be linked with work into new smart materials and the 
creation of novel morphologies and controllers. 
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Abstract. A review is given of approaches to growing neural networks and 
electronic circuits. A new method for growing graphs and circuits using a de- 
velopmental process is discussed. The method is inspired hy the view that the 
cell is the basic unit of biology. Programs that construct circuits are evolved to 
build a sequence of digital circuits at user specified iterations. The programs 
can be run for an arbitrary number of iterations so circuits of huge size could be 
created that could not be evolved. It is shown that the circuit building programs 
are capable of correctly predicting the next circuit in a sequence of larger even 
parity functions. The new method however finds building specific circuits more 
difficult than a non-developmental method. 



1 Introduction 

Natural evolution builds organisms using a process of biological development. In this 
a fertilized cell begins a process of replication and differentiation that culminates in 
an organism made of a huge number of specialist cells. Each cell carries the same 
genetic information yet somehow this decentralized, distributed community of cells 
builds a whole organism. Human beings design things in a top-down manner. Elec- 
tronic circuits are built from high-level specification, to synthesis of symbolic com- 
ponents. These are then converted into transistor-level designs and finally wafers of 
silicon and wires are created. This method of design will become increasingly unten- 
able as the size of electronic components continues to shrink. 

In evolutionary computing a problem of scalability has become evident. The word 
scalability has a number of meanings: the evolution time increases markedly with 
problem size, the genotype length is proportional to problem size, the density of solu- 
tions in the search space associated with larger problems is a sharply decreasing func- 
tion, and the time for fitness evaluation increases rapidly with problem size. The 
problem is particularly evident in the evolution of neural networks, where each link 
requires a floating-point weight that must be determined. In Genetic Programming 
researchers have generally concentrated on relatively small and simple problem such 
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as parity functions and the Santa Fe Ant Trail. This has partly been motivated by 
scientific reasons in that these, though small are sufficiently difficult benchmarks for 
the evaluation of new techniques. Koza has been able to produce large and complex, 
human competitive analog circuit designs, but only by providing enormous computa- 
tional power [19]. The problems of scalability have partly motivated a number of 
attempts to consider the use of a developmental approach to evolutionary design. In 
the field of evolvable hardware researchers have also tended to concentrate on rela- 
tively small circuit design problems. In digital circuit design the scalability problem is 
particularly acute [26] . 



2 Lindenmeyer Systems, Graph Re-writing 
and Developmental Approaches 

A number of researchers have studied the potential of Lindenmeyer systems [20] for 
developing artificial neural networks (ANNs) and generative design. Boers and Kui- 
per have adapted L-systems to develop the architecture of artificial neural networks 
(ANNs) (numbers of neurons and their connections) [3]. They used an evolutionary 
algorithm to evolve the rules of a L-system that generated feed-forward neural net- 
works. Backpropagation was used, and the accuracy of the neural networks on test 
data was assigned to the fitness of the encoded rules. They found that this method 
produced more modular neural networks that performed better than networks with a 
predefined structure. Kitano had developed another method for evolving the architec- 
ture of an artificial neural network [17] using a matrix re-writing system that manipu- 
lated adjacency matrices. Although Kitano claimed that his method produced superior 
results to direct methods (i.e. a fixed architecture, directly encoded and evolved), it 
was later shown in a more careful study that the two approaches were of equal quality 
[23]. Gruau devised an elegant graph re-writing method called cellular encoding [10]. 
Cellular encoding is a language for local graph transformations that controls the divi- 
sion of cells that grow into artificial neural networks. This method was shown to be 
effective at optimizing both the architecture and weights at the same time, and they 
found that, to achieve as good performance with direct encoding, required the testing 
of many candidate architectures [11]. Others have successfully employed this ap- 
proach in the evolution of recurrent neural networks that control the behaviour of 
simulated insects [18]. Koza has successfully employed a modified cellular encoding 
technique to allow the evolution of programs that produce human-competitive de- 
signs, especially in analog circuit design [19]. Recently Hornby and Pollack have also 
evolved context free L-systems to define three dimensional objects (table designs) 
[14]. They found that their generative system could produce designs with higher 
fitness and faster, than direct methods. 

Jacobi created an impressive artificial genomic regulatory network, where genes 
code for proteins and proteins activate (or suppress) genes [16]. He used the proteins 
to define neurons with excitatory or inhibitory dendrites. This allowed him to define a 
recurrent ANN that was used to control a simulated Khepera robot for obstacle avoid- 
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avoidance and corridor following. Nolfi and Parisi evolved encoded neuron position 
and branching properties of axonal trees that would spread out from the neurons and 
connect to other neurons [22] and in later work introduced cell division using a 
grammar [5]. Astor and Adami have created a developmental model of the evolution 
of ANN that utilizes an artificial chemistry [1], 

Eggenherger suggests that the complex genotype-phenotype mappings typically 
employed in developmental models allow the reduction of genetic information with- 
out losing the complex behaviour. He stresses the importance of the fact that the 
genotype will not necessarily grow as the number of cells, thus he feels that devel- 
opmental approaches will scale better on complex problems [8]. Sims evolved the 
morphology and behaviour of simulated agents [24]. Bongard and Pfeifer have 
evolved genotypes that encode a gene expression method to develop the morphology 
and neural control of multi-articulated simulated agents [4]. 

Bentley and Kumar examined a number of genotype-phenotype mappings on a 
problem of creating a tessellating tile pattern [2]. They found that the indirect devel- 
opmental mapping (that they referred to as an implicit embryogeny) could evolve the 
tiling patterns much quicker, and further, that they could be subsequently grown 
(iterated) to much larger sized patterns. One drawback that they reported, was that the 
implicit embryogeny tended to produce the same types of patterns. 

Hemmi, Mizoguchi and Shimohara evolved the rules of a rewriting system to de- 
fine HDL programs [12]. Recently interest in developmental approaches in Evolvable 
Hardware has begun to increase. Haddow, Tufte and van Remortel have considered 
the use of Lindenmeyer systems for digital circuit design [13]. Their "cells" are like 
configurable logic blocks in EPGAs and cell interactions are defined by neighbouring 
cells in a two-dimensional grid system. They note that there are many epistatic effects 
that make evolution of the genotype difficult. Gordon and Bentley have compared a 
developmental evolutionary approach with a non-developmental encoding. They 
found that the could evolve these direct mappings more easily [9]. Edwards also ex- 
amined developmental genotypes and possible physical implementations [7]. 



3 Cartesian Genetic Programming for Circuits 

Cartesian Genetic Programming was developed from methods developed for the 
automatic evolution of digital circuits [21]. Since electronic circuits can be repre- 
sented by graphs (often referred to as netlists) it was natural to generalize it to the 
general problem of evolving computer programs. CGP represents a program or circuit 
as a list of integers that encode the connections and functions. The representation is 
readily understood from a small example. Consider a one bit binary adder circuit. 
This has three inputs that represent the two bits to be summed and the carry-in bit. It 
has two outputs: sum and carry-out. A possible implementation of this is shown be- 
low: 
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Fig. 3.1. One-bit adder circuit. 



CGP employs an indexed list of functions that represent in this example various 
two input logic gates and also three input multiplexers. Suppose that XOR is function 
10 and MUX is function 16. The three inputs A, B, Cin are labeled 0, 1, 2. The output 
of the left (right) XOR gate is labeled 3 (5). The output of the MUX gate is labeled 6. 
In CGP, this circuit could have the following genotype representation: 

010 10 2026 023 16 322 10 

The integers in bold represent the functions, and the others represent the connec- 
tions between gates, however, if it happens to be a two input gate then the third input 
is ignored. It is assumed that the circuit outputs are taken from the last two nodes. 
The second group of four integers (shown in grey) represent an AND gate that is not 
part of the circuit phenotype (hence doesn’t appear in Fig 3.1). Since we are only 
interested in feedforward circuits here, it is important to note that the connections to 
any gate can only refer to gates that appear on its left. Typically in CGP we use point 
mutation (that is constrained to respect the feedforward nature of the circuit). Sup- 
pose that the first input of the MUX gate (0) was changed to 4. This would connect 
the AND gate into the circuit. Similarly, a point mutation might disconnect gates. 
Thus it is clear that in CGP we use a many to one genotype-phenotype mapping, as 
redundant nodes may be changed in any way and the genotypes would still be de- 
coded to the same phenotype. We use an (1 h- 4)-ES evolutionary algorithm that uses 
characteristics of this genotype-phenotype mapping to great advantage (i.e. genetic 
drift). This is given below: 

1 . Generate 5 chromosomes randomly to form the population 

2. Evaluate the fitness of all the chromosomes in the population 

3. Determine the best chromosome (called it current _best) 

4. Generate 4 more chromosomes (offspring) by mutating the current_best 

5. The current_best and the four offspring become the new population 

6. Unless stopping criterion reached return to 2 

Step 3 is a crucial step in this algorithm: if more than one chromosome is equally 
good then the algorithm always chooses the chromosome that is not the current_best 
(i.e. equally fit but genetically different). In a number of studies this step has been 
proved to allow a genetic drift process that turns out be of great benefit [25][29]. In 
later sections we refer to the mutation rate, this is the percentage of each chromosome 
that is mutated in step 4. 
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4 Developmental Cartesian Genetic Programming 

The philosophical standpoint for the form of development described here is that the 
cell is the basic unit of biology. We see evolution as a process of evolving a cell. The 
cell is a very clever piece of machinery that can in co-operation with an environment 
self-replicate and differentiate to form a whole organism. Thus in developmental 
Cartesian GP (DCGP) we attempt to evolve a cell that can construct a larger program 
by iteration of the cell's program in its environment'. To elaborate this further, the 
idea is to define a program (encoded as a CGP graph) that is run inside the nodes 
(cells) of another graph (that is the phenotype or organism, or in this case the final 
digital circuit). It is important to note that in the context of the organism being a digi- 
tal circuit, the cell is identified with a logic gate. All the gates of the final circuit are 
produced by running the same program inside all the nodes (gates) of the circuit 
graph for a given number of iterations. In the version of developmental CGP de- 
scribed here, we define the “environment” of a given cell to be the position of the cell 
and the integer labels of the cells that are connected to that cell. The analogues be- 
tween the abstract graph re-writing rules and real biological entities are given below: 

DNA bases: Cartesian Cell Genotype; Translation: Genotype-Phenotype decoding 
Positional information: Node position; Cell division: Node replication 
Cell type: node function type (gate); Zygote: seed node 
Neighbouring cell signals: Node inputs (gate inputs) 

Of course, this is a highly idealized environment and further investigations are 
planned that will take into account the functions of the cells that are connected to the 
cell in question (this will allow semantic information to affect cell function). Imagine 
that we start with a 'seed' cell, this can only be connected to the program inputs. Here 
is one possible seed (and the one used for all the experiments described in this paper): 

0 



«.-i 

Fig. 4.1. Seed cell with n. program 
ble gates. 

Now we want to run a program inside the cell that uses information about the cell 
and its environment to construct a new one, indeed, we also want it to be able to rep- 
licate itself, so that we can grow a larger program. The cell needs four pieces of in- 
formation, its two connections, its function and its position and the program inside 




0 

inputs (0- M, ,), g„ denotes the first gate type in a list of possi- 



* The technique described here is much more biologically motivated than either L-systems or 
cellular encoding in addition the re-writing rules have the an enormously richer space of 
transformations that are possible with either, and with a much simpler conceptual apparatus. 
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the cell needs to take this information and create a new node with connections and a 
function and whether the cell should be duplicated. So the situation is now 




Fig. 4.2. Cell with program inputs and outputs. 

The cell program is a mapping from the integers that define the cell's connections, 
function and position to a new set of integers defining its new connections, function 
and whether it will replicate itself. Since we are using this program to construct a 
circuit that is feed-forward , the integers that come out of the node program must take 
the correct values (i.e. the connections must be to nodes on the left of our current 
position, the functions must be ones on the list of valid functions, and finally the 
divide must be 0 or 1). Unfortunately it is very difficult to construct programs that 
will automatically respect the feed-forward requirement. After we have run the cell 
program we will need to carry out some sort of operation to bring the numbers into 
the correct ranges. A simple way to do this is to apply a modulo operation (note for 
the new function F' we use the X mod as the address in the function look up table). 
Finally, we are left with deciding how to allow the evolution of the program inside 
the cell. This can be accomplished using CGP with primitive functions that 
manipulate integers. 




Fig. 4.3. Inside a cell is a Cartesian program defined using arithmetic functions. The phenotype 
nodes are valid feed-forward nodes due to the modulo operations. 

To construct a circuit we run a single Cartesian program in each cell of a 
developing circuit (organism). This is called an iteration. Then at each iteration, we 
execute the same program. After a user defined number of iterations we test the 
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developed circuit against the specification for the circuit (i.e. we count how many bits 
are correct). To clarify this still further let us consider an example of an evolved cell 
program that constructs the even-parity function with four inputs. Suppose that there 
are two primitive arithmetic functions /„ and/^ of arity two, defined as follows: 

#1 x> y (4.1) 

fo= fo(x,y) = x+y , f5= f5ix,y) = y 

]0 x< y 

Consider the following genotype 

node labels 4 5 6 7 8 9 10 11 

Cartesian genotype 125 440 325 205 210 870 430 155 

There are four inputs: connection! (A), connection2 (B), function (F), position (P) 
with labels 0, 1, 2, 3. The outputs of the program are taken from nodes 8, 9, 10, 11 
and these outputs are denoted Z, Y, X, W (as seen in Fig. 4.3). Note that in this 
genotype node 6 is not referenced by nodes on its right, so is redundant. We can 
decode the genotype into a set of arithmetic rules dependent on the four inputs. 

Z = F + B , Y = Z + f^(F,A) , (4.2) 

X=P + f^(B,F) , W = f5(B,2f5(B,F)) 

To clarify the origin of these rules, consider the expression for W, this is the output of 
node 11, the first input is 1 (B), the second input refers to node 5. This is the addition 
of its two inputs, both of which refer to node 4. Node 4 is function 5 acting on it two 
inputs 1 and 2, that is, B and F (hence 2f^(B,F) is the second argument of in W). 
Armed with the rules of equation 4.2 we can apply them to the seed cell and carry out 
the first iteration. Note that in this example we are using two functions in the 
phenotype, XOR (labelled as decimal 10) and XNOR (labelled as decimal 11). We 
are considering even-4 parity which has four inputs (labelled 0, 1, 2, 3). We found 
that the program worked much better when we allowed the right daughter cell to 
differentiate again (disallowing further replication)^. Thus, we define the word 
"iteration " to mean running the cell program in each cell and running the cell 
program (ignoring replication) in the right daughter cell. The reason why this helps 
markedly is still under investigation. The steps required for one iteration are shown in 
table 4.1. In the first "interpretation" line in the table a 10 appears (asterisk) because 
the cell program (after modulo operation) is 0, this refers to the zeroth function, 
which in this case is 10 denoting the XOR operation. After one iteration the circuit is 
represented by the encoded phenotype: 2 3 10 3 4 11 which is the circuit shown in 
Fig. 4.4. 

In the next iteration the rules defined in equation 4.2 are then applied to each cell 
of the phenotype. Applying the rules to the cell defined by 2 3 10 we obtain 1 2 10 
that is replicated (since D=l) giving 1 2 10 1 2 10 3 4 11 and when the program is 
run with 1 2 10 (node 5) we obtain 2 3 11 (note D is ignored on the second 
differentiating step - see earlier). Finally applying the rules to the cell defined by 3 4 



^ This step was found to allow more diversity in nodes in the phenotype, without it there tended 
to occur repeat node groups. 
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11 we find that if replicates to 0 1 11 0 1 11 and on differentiating the last cell we 
obtain the new encoded phenotype 1 2 10 2 3 11 0 1 11 5 6 11. This is 

even-four parity circuit below shown on the right in Fig. 4.4. 



Table 4.1. Applying the cell program to the seed cell (first iteration) 



Seed cell 






Cell program outputs (see equation 4.2) 


A 


B 


F 


P 


m A) 


A(B, F) 


Z 


Y 


Y 


IT 


0 


0 


10 


4 


1 


0 


10 


11 


4 


1 














Zmod 4 


Ymod4 


Xmod2 


Wmod2 


New left cell 


2 


3 


0 


1 


Interpretation 


2 


3 


10* 


replicate 


Right cell 










2 


3 


10 


5 


1 


0 


13 


14 


5 


1 














Zmod 5 


Ymod5 


Xmod2 


Wmod2 


New right cell 


3 


4 


1 


1 


Interpretation 


3 


4 


11 


ignore 



0 



V=^5>-1=4E> 




Fig. 4.4. The developed circuit (one iteration left), two iterations (right). 

The idea of growing graphs or circuits leads to some very interesting consequences. 
It is possible for us to provide rewards during the growth process. It also allows us to 
consider trying to evolve programs that construct classes of circuits, by requiring that 
at particular points in time the circuit have a particular behaviour. In the next section 
we describe experiments and results for the problem of growing the class of even- 
parity functions. 



5 Experiments and Results 

In the experiments reported here we were able to choose from a set of seven primitive 
arithmetic functions to build the cell program. It is extremely difficult to know what 
would be a good choice for such functions^. The functions chosen are listed below: 



^ This is akin to trying to define a sequence of DNA base pairs that will give rise to a specific 
phenotypic trait. The function set chosen (equation 5.2) so as to provide a "complete" set of 
arithmetic operations with the addition of comparisons (f^) and controlled swapping (f^). 
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fo = foix,y) = x+y , f^ = f^(x,y) = x-y 

h = f 2 i^^y) = xraod{y + \) , = f^(x, y) = xy , = f^(x,y) = x/ y 



#1 x>y #x z = 0 

fs = fs(x,y) = V , fe = f6(x,y,z) = v 

x< y \yz^0 



(5.2) 



Where x, y, z are the integer-value inputs to any node in the cell program. Note that in 
the operator is integer division. The circuit phenotype is built from the two binary 
input (a and b) functions p(a, b) that are defined in table 5.2: 



Table 5.2. Possible circuit functions 



Integer code 


P (a, b) 


Integer code 


P (a, b) 


0 


0 


8 


AND(NOT(fl),fc) 


1 


1 


9 


NOR(fl, b) 


2 


a 


10 


XOR(a, b) 


3 


b 


11 


XNOR(a, b) 


4 


NOT(fl) 


12 


OR(a, b) 


5 


NOT(i) 


13 


OR(fl, NOT(fo)) 


6 


AND(a, b) 


14 


OR(NOT(a), b) 


7 


AND(a, NOT(fo)) 


15 


NAND(a, b) 



In the first experiment we wanted to test the feasibility of the new approach, so we 
tried a simple problem of evolving the program that would construct a binary adder 
with the function defined in section 3. The experimental parameters are seen in ta- 
ble 5.3. We obtained three perfect adders designs. When observing the runs we ob- 
served that the algorithm reached fitness 14 (two bits incorrect) quite quickly but was 
often unable to improve. The average fitness obtained was 14.18. This does not com- 
pare well with a directly (non-developmental) evolved circuit (where the genotype 
directly encoded the circuit using the same representation as the phenotype) with the 
same parameters. We obtained 100 perfect adder circuits in that experiment. In the 
second experiment we tried to evolve a program that would build a series of even 
parity functions. The experimental parameters are shown in table 5.4. The circuit 
functions (phenotype) chosen for this experiment allow even parity functions to be 
built easily. The fitness of the cell program was defined as the sum of the correct 
truth table bits for all the target functions. We obtained 60 cell programs that could 
correctly build the even parity functions at the desired iterations. All these programs 
were subsequently tested at iteration four to see whether any of them could correctly 
build even-6 parity, and one program succeeded. 

Unfortunately it proved not to be a general solution to the even-parity problem as 
it was found that it did not correctly build even-7 parity at the fifth iteration. Other 
researchers have successfully evolved completely general solutions to even-parity 
problems. However they all provide bit strings serially and work at a much higher 
level of abstraction.: machine code [15], recursive functions [27], and lambda abstrac- 
tions [28]. 
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Table 5.3. Parameters for experiment 1 



Parameter 


value 


Population size 


5 


Mutation rate 


8% 


Number of generations 


20,000 


Number of runs 


100 


Number of circuits 


1 


Desired circuit 


One-bit adder 


Iteration at which circuit required 


4 


Maximum number of genotype nodes 


20 


Genotype node functions used 


0-6 


Phenotype node functions 


6,7,10 



Table 5.4 Parameters for experiment 2 



Parameter 


value 


Population size 


5 


Mutation rate 


3% 


Number of generations 


100,000 


Number of runs 


100 


Number of circuits 


3 


Desired circuits 


even-3, even-4, evenS parity 


Iteration at which circuit required 


1,2,3 


Maximum number of genotype nodes 


40 


Genotype node functions used 


0-6 


Phenotype node functions 


10,11 



6 Conclusions 

We have presented a new way to evolve digital circuits that uses a developmental 
mapping that is based on the idea of evolving the program for a cell. It is capable of 
constructing a number of Boolean circuits at user specified points in time. The circuit 
construction program can in principle build circuits of arbitrary size. However, in 
practice the genotypes are not as evolvable as direct encodings. Designing an effec- 
tive developmental mapping is a very difficult problem. The mapping we have pre- 
sented is very general and flexible. However there are many aspects that require fur- 
ther investigation. In our representation the "neighbourhood" of a cell are the 
positions of the cells that are connected to it. Intuitively it seems more natural to take 
into account the functions of the neighbours. We have used a form of Cartesian GP 
that is linear in nature. In would be interesting to investigate the behaviour of the 
developmental algorithm if the circuit was defined in terms of a two-dimensional 
grid. One could evolve a circuit in which gates were arranged in columns and one 
could define the neighbours of a gate to be some gates that were physically close. 
When the behaviour of the program that developed correct even parity functions was 
examined it was found that the circuits at successive iterations were generally quite 
different. One could potentially inherit more structure from previous circuits if the 
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cell program was not run in every cell but only in the last cell. This of course, would 
depart from the biological inspired form of development. The form of development 
described here works at a quite abstract level in that the integers that form the input to 
the cell program are labels and addresses. It may be that a more direct representation 
would be more effective, for instance, one in which wires are particular types of cells. 
The problems we have studied here are all relatively simple and it may be that devel- 
opmental approaches are more valuable when the circuits being grown are enormous. 
Many authors tend to believe that the shorter developmental genotypes will be more 
evolvable, however, this work suggests that the issue is much more complex. 
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Abstract. The developmental metaphor has clear advantages for the 
design of physically-realizable artifacts, particularly when coupled with 
evolutionary algorithms. However, the embodiment of a developmental 
process in a purely computational system appears much more problem- 
atic, largely because embryogenesis evolved for the purpose of synthesiz- 
ing 3-dimensional structure from a linear code, not for growing Universal 
Turing Machines. 

This research considers possible models of computational problem-solv- 
ing based on the 5 primary developmental stages: cleavage division, pat- 
terning, cell differentiation, morphogenesis, and growth. A specific devel- 
opmental approach to the NP-Complete problem, vertex cover (VC), is 
discussed, as well as a general model of developmental computation based 
on a multicellular extension of PUSH a new stack-based language 
designed specifically for evolutionary computation. 



1 Introduction 

While applied computer scientists hunt feverishly for the next killera app (applica- 
tion), basic researchers search for a new killer an (analogy). Biology has provided 
a steady stream of exceptionally useful metaphors, resulting in technologies such 
as evolutionary computation (ECh ^HI . artificial neural networks (ANN) [0, ant- 
colony optimization (AGO) [0, and artificial immune systems (AIS)[0. One of 
the newest hot possibilities is natural development: the process by which a single 
cell gives rise to a fully-functioning multicellular organism. 

The potential advantages abound for incorporating a developmental process 
into tasks such as automated design, particularly when an evolutionary approach 
is employed. The most convincing argument lies in the simple fact that natural 
genomes encode recipes for development, not blueprints for the 3-d layout of 
the final organism. Hence, the genome is a very compact description of a very 
complex product. Automated evolutionary design systems using a blueprint ap- 
proach must resort to huge genomes for describing complex artifacts, whether 
hardware circuits, computer programs, simulated organisms, airplanes or even 
furniture. However, in many cases, these products have an organized hierarchi- 
cal structure that is often replete with symmetries, redundancies and recursion. 
The difficulty of evolving such patterns in a monolithic blueprint genome scales 
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exponentially with the size of the desired artifact, but concise recipe genomes 
can generate the appropriate structure with only a constant scaling factor, often 
zero, relating the sizes of genotypes to phenotypes. 

However, the complexity of the natural developmental process is daunting, 
particularly to a computer scientist. One may argue that so too is evolution, and 
Holland [0 and others have masterfully tamed it for conventional use. But, the 
founding fathers of EC had a beautifully simple foundational abstraction, laid 
down by Darwin himself, on which to build their fortress: the triad of variation, 
heritability and selection (survival of the fittest). By instantiating these three 
factors in a myriad ways, EC has become an omnipotent tool for difficult search 
problems. 

Development does boil down to one simple abstraction, that of the genome 
as a recipe for phenotype generation, which has already been widely exploited 
in EC. But delving a bit deeper in search of a slightly more detailed set of 
useful, transferable principles, the going gets much tougher. There are a few 
basic stages of development, as described below, but the utility of their analogs in 
artifact design tasks is far from obvious. Many basic developmental mechanisms, 
such as cell duplication, differentiation and migration, are exemplary tools for 
generating a structured, heterogeneous 3-dimensional object in vivo, but it is 
unclear whether they have legitimate utility in solving problems in silico. 

2 Five Basic Stages of Development 

Natural development can be characterized by five distinct (but temporally and 
spatially overlapping) processes: 1) cleavage division, 2) pattern formation, 3) 
morphogenesis, 4) differentiation, and 5) growth. The following is a brief sum- 
mary of these processes, as detailed in [E]. 

In cleavage division, the original egg cell divides repeatedly, with each daugh- 
ter cell receiving a copy of the mother cell’s DNA plus half its biomass. The 
daughter cells do not grow before themselves dividing. Hence, after n rounds 
of division, an egg cell of mass M is replaced by 2” cells of mass M ^ 2". The 
function of this process thus appears to be the copying and distribution of the 
genome into many (nearly) identical cells, which can later differentiate, migrate 
and grow to form the heterogeneous phenotype. The embryo thus becomes a ball 
of cells called the blastula. 

Although even a fertilized egg has slight asymmetries, the second process, 
pattern formation involves critical symmetry breaking to form the anterior- 
posterior (head to tail) and dorsal-ventral (back to front) axes. Chemical gra- 
dients created by certain signalling-center cells or organizers form these axes, 
an orthogonal set of which creates a coordinate system. Regional organizers and 
their gradients can also provide some of the finer asymmetries and details of 
the developing embryo, such as the formation of the different fingers of a hand. 
The three germ layers {ectoderm, mesoderm and endoderm), which are initially 
concentric, are also created during pattern formation. 

Morphogenesis involves major changes in 3-dimensional shape of the embryo. 
In one morphogenic process, gastrulation, the three germ layers split and fold to 
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form the mouth and gut cavities, thus transforming the blastula into a gastrula. 
In neurulation, the ectoderm folds to create the neural crest and tube, precursors 
to the vertebrae, limbs and spinal cord. Cell migration underlies most morpho- 
genetic processes. This results from cellular changes in gene expression, which 
can, among other things, a) alter the surface chemistry of the cell, changing 
its repellant/attractive tendencies toward other cells, and b) modify the elastic 
properties of the cell membrane, making the cell more or less likely to change 
shape and/or move when pressed upon by other cells. 

Cell differentiation is the process by which cells achieve different patterns of 
gene expression within the embryo. Some of these activity changes are reversible, 
but many are not. As development proceeds, more and more cells become deter- 
mined: essentially stuck in an activity pattern. These attractors in activity space 
eventually gives rise to one of the many morphologically distinct cell types, such 
as a pancreatic cell or a pyramidal neuron. 

There are at least 3 key mechanisms for differentiation: 1) induction, 2) asym- 
metric cell division, and 3) timing. Induction involves intercellular signalling, 
achieved by either direct contact between cells or via diffusion of chemical mes- 
sengers into the extracellular fluid. In asymmetric cell divisions, the two daughter 
cells do not get exactly the same chemical constituents from the mother cell and 
hence may subsequently follow different paths of development. Finally, cells may 
have built-in chemical clocks that trigger changes in gene activity. 

The final main developmental process, growth, is just an increase in size 
of the organism, with only minor changes in shape. This may result from the 
growth of individual cells, the replication of cells (with daughter cells growing 
to adult size before dividing) or the accumulation of extracellular material such 
as fluids or bone matrix. Induction and chemical timing mechanisms play key 
roles in controlling growth. 

The sequencing, overlap and interaction between these processes varies across 
species, although the order of their initiation roughly follows the enumeration 
above. That is, the egg cell begins development with many cleavage divisions, 
followed by overlapping pattern formation, morphogenesis and differentiation. 
This results in a miniature organism which then grows to newborn size with 
visible external changes but only minor internal and external rearrangements. 

3 Utility of Development for Evolutionary Design 

From an artificial life (ALife) perspective, each developmental process is inter- 
esting and well worth simulating. For example, Japanese researchers [iTrn] have 
achieved the emergence of several developmental stages from the primitive be- 
haviors of individual cells: their metabolisms and adhesive properties. 

However, from the perspective of automated evolutionary design, the el- 
ements of natural development stimulate interesting, but largely unjustified, 
analogies. For example, although there have been attempts to grow artificial 
neural network topologies using everything from simple, evolvable rules [Z] to 
detailed neurogenetic-like processes Q, these methods cannot match the perfor- 
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mance of less biological approaches to ANN evolution such as SANE m and 
NEAT [O]- 

The basic concept of simulated evolutionary developmental design (SEDD) is 
straightforward. Assume a generic evolutionary algorithm, in which the genome 
embodies a recipe for creation of the phenotype. The SEDD system will begin 
with a single module/cell containing the genome plus some other cell-specific 
properties, e.g., a vector of state variables such as size, location, activity level, 
age, etc. By executing the genome, the cell may alter its own state, send sig- 
nals to other cells, reproduce (by splitting), or die. Over time, the SEDD will 
create a collection of cells that can be interpreted (by, for example, integrating 
information from each cell’s state) as a testable phenotype, whether an ANN, 
an arithmetic circuit, a computer program, or a coffee table. 

4 A Specific Problem: Vertex Cover 

Consider the classic NP-complete problem of vertex cover (VC) [EJ: given a 
constant integer k and an undirected graph consisting of a vertex set, V, and an 
edge set E, find a set of vertices, Vg C V, such that size{Vs) < k and each edge 
in E has at least one of its two vertices in Vg- 

To grow a VC solution, begin with a single cell whose internal state represents 
Vi & V . The propensity for a cell to divide is proportional to the size of the edge 
set of Vi, so the seed cell should represent a well-connected vertex. These cleavage 
divisions would quickly lead to many Vi cells. 

Inductive signalling from these cells could then cause problem-related dif- 
ferentiation by a simple procedure: Vi cells would emit inhibitor signals for any 
vertices in V, the immediate neighbors of Vi. Cells would then read gradients of 
inhibitors (and activators) and possibly change their own vertex as a weighted 
deterministic function of the signals. Cells could also have a not-vertex state, 
such as ^ Vj, and such cells would emit activation signals for any vertices in Nj . 

Finally, cells could migrate up and down signal gradients for their vertex 
type. For instance, a cell expressing Vi (~ Vi) would migrate toward increasing 
(decreasing) Vi activator and away from increasing (decreasing) Vi inhibitor, 
using a deterministic function based on weighted contributions from the two 
gradients. Over time, this might lead to relatively stable configurations with few 
conflicts between the ambient signals and the local cell types. 

After a certain number of developmental steps, or after a certain degree of 
stability were attained, a potential solution, Vp could be read from the set of all 
cell vertices, with each ~ Vi cell cancelling out one Vi cell. 

To put this process under evolutionary control, the genome could encode an 
initial seed cluster of cells, a set of parameters for the weighted state-change 
and movement routines, or even a GP program with primitives for expressing 
activators/inhibitors, for migrating, etc. 

Although the odds of this working successfully are slight, it illustrates that 
even a conventional problem could be attacked from a developmental perspective, 
in this case, one involving a) cleavage division, b) differentiation by induction, 
and c) morphogenesis by migration. 
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5 Multicellular Programs 

Although tailoring a developmental process for a particular problem might prove 
fruitful for certain types of well-structured problems, a higher-level metaphor 
that links development to a general computational device would provide a flexible 
platform for embryonic problem-solving. 

To fully exploit the developmental metaphor in a computational setting, we 
need a meaningful correspondence between living cells and computational units, 
as found in the Biowatch project m Fortunately, the recent development of 
PUSH a, stack-based language designed for genetic programming P, pro- 
vides an ideal substrate for this analogy. 

In PUSH, an (evolvable) genome encodes a list of primitive operations (in 
postfix form) that can act on a variety of data types. So a single program may 
include standard arithmetic, boolean, list and array operators. While standard 
GP must handle type mixing with strongly-typed programs mg and their ensuing 
computational overhead, PUSH very elegantly solves the problem with multiple 
data stacks, one for each data type. Thus, when an operator needs arguments 
of particular types, PUSH simply pops them from the appropriate data stacks; 
the operator’s result is then pushed onto the correct stack as well. PUSH even 
provides code stacks, so that programs can build and run code chunks on the 

fly- 

PUSH is implemented in LISP, a language that treats data and programs 
equally in the sense that programs can be a) passed as arguments to func- 
tions, just like data, and b) constructed by standard data-manipulation routines 
from bits of numeric and symbolic data, just like any other data structure. So 
data can be processed directly or incorporated into other programs that process 
data, which is analogoues to living cells, where proteins and other compunds 
are both metabolic resources AND components of the machinery that performs 
metabolism, mitosis, RNA translation, etc. Hence, by running Lisp-like code in 
a population of weakly-communicating modules, one can achieve a degree of 
abstract biological realism. 

5.1 Cellular PUSH Programs 

A Cellular PUSH Program (CEPP) is a module that runs pieces of PUSH- 
like code and resides in a network containing other CEPPs. It communicates by 
pushing data values onto the stacks of neighboring CEPPs, migrates by swapping 
places with a neighbor, and reproduces by copying its genotype (a PUSH-like 
code chunk) into a newly-formed neighbor cell. The current CEPP prototype, 
running in Common Lisp, provides a basic feel for the language, as illustrated 
by the detailed examples below. 

As in PUSH, CEPP relies on a type hierarchy to control the inputs and out- 
puts to typed functions, making sure that they are popped from and pushed 
onto the proper stacks. This hierarchy has two main branches: atom and struc- 
ture. Atoms are all single items such as booleans, numbers or symbols (with 
numbers being further partitioned into integers and reals). A structure is any 
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data structure, such as an expression (a possibly-nested list) or an array. The 
two main subtypes of expression are code and data. A CEPP runs by popping 
and executing items from its code stack. Only the leaf types, i.e. most specific 
types, of the hierarchy have corresponding stacks. 

CEPP also has a type stack, which represents a prioritized list of types used to 
disambiguate general function calls. For example, the function -|- takes two input 
arguments of type number and returns a number. When a -I- is invoked, CEPP 
checks the type stack to find the first occurence of a leaf subtype of number, 
whether integer or real. It then pops from and pushes to the appropriate stack 
when performing the addition. Unlike the other stacks, the type stack is never 
popped. 

A CEPP colony consists of one or more CEPPs, all sharing the same geno- 
typic code chunk. Simulation of the colony involves the repeated running of 
individual CEPPs for a fixed number of execution steps. The colony begins as 
a single seed cell, which may or may not multiply to form an interacting cell 
population. 

Consider the following CEPP genotype: 

( quote (integer 1 - real dup) 
quote (real -k) 
integer dup 1 > if)) 

To test this CEPP, push the integer 8 and the real 13.0 onto their respec- 
tive stacks. The genotype is then pushed onto the code stack, and CEPP’s do 
command is invoked. This executes the top item on the code stack and pops it 
AFTERWARDS, thus facilitating self- modifying code. 

CEPP uses postfix notation, but the quote command is an important excep- 
tion. When reading the symbol quote, the interpreter sets a flag, telling it to 
push the next item onto an expression stack. The default type priority at the 
bottom of the type stack lists code as the highest priority expression. In this 
case, the next item is a list, so the sequence (integer 1 - real dup) is pushed onto 
the code stack. The second quote then forces (real +) to he pushed onto the 
code stack, which now contains: 

(real -k) 

(integer 1 - real dup) 

(quote (integer 1 - real dup) quote (real -k) integer dup 1 > if) 

The symbol integer is then pushed onto the type stack, and this instructs 
the next command, dup, to duplicate the top of the integer stack, which then 
consists of (8, 8), where the top item is leftmost, dup is useful for preserving the 
top of a stack in situations where an upcoming operator (in this case, if) will 
pop it. 

Next, the integer 1 is pushed, and then 8 > 1 is computed, giving true, which 
is pushed onto the boolean stack. The if operation pops the code stack twice 
and the boolean stack once. Since the boolean is true, the second code chunk, 
(integer 1 - real dup), is executed. This dictates the subtraction of 1 from the 
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top of the integer stack and then a duplication of the top of the real stack, which 
then reads (13.0, 13.0). 

During the CEPP’s alloted execution steps, if the code stack becomes empty, 
the genotype is re-pushed and executed again. Also, the CEPP’s stack state is 
preserved between time steps, with interrupted code chunks being resumed on 
the next time step. 

Hence, in the current example, the genotype is re-pushed and executed again, 
but this time with an integer stack of (7) and a real stack of (13.0 13.0). This 
process of decrementing the top integer and adding another 13.0 to the real stack 
continues until the top integer is 1 and the real stack contains eight 13.0’s. Since 
1 > 1 is now false, each execution of the if in the genotype will cause the (real 
+) code chunk to be executed, eventually resulting in 104.0 (8 x 13.0) atop the 
real stack. 



5.2 Computational Resources in CEPP 

CEPP adds the concept of resources into the PUSH paradigm, such that every 
item that is pushed onto a stack needs to be created from the available resource 
pool, and every popped item replenishes the pool. Resources are of different 
types, corresponding to the leaf nodes on the CEPP type tree. For each descen- 
dent leaf subtype of the class atom, every instance of that leaf type requires one 
unit of the specific resource. For example, pushing a boolean onto the boolean 
stack requires one boolean unit of resource. 

The leaf descendents of the class structure demand greater resources, essen- 
tially one unit for each atomic item and one unit of glue to bind it to the data 
structure. For example, with lists, the glue is similar to a LISP cons cells. Hence, 
the previous code chunk, ( integer 1 - real dup ), is a structure that requires 1 in- 
teger resource unit, 2 type units (for integer and real), two symbol units (for — 
and dup) and 5 units of code glue. 

Most primitive CEPP functions, such as the arithmetic, logical and list oper- 
ations, pop input items from stacks and push their results onto the same or other 
stacks. In many cases, such as arithmetic, the resources freed up by the input 
pops are more than sufficient to cover the resource needs of the ensuing push. 
However, some operators such as the boolean comparators, require a resource 
that is not provided by the inputs. Hence, some operations can be blocked by 
unavailable resources, despite the presence of all necessary input arguments. 

Also, any data item appearing in a code chunk requires additional resource in 
order to push it onto a stack. Thus, while executing (3 4 9.2 integer -I-), CEPP 
will require an additional 2 integer units, one real unit and one type unit in 
order to push the first 4 items onto their respective stacks. In short, when the 
contents of a code expression become active (via their presence on stacks), they 
drain resources. Also, the code chunk itself drains resources, since it too resides 
on a stack. Thus, although a recursive genotype can easily copy several versions 
of itself onto the code stack, using a (code dup) sequence, each copy requires 
resources, whose availability can thereby limit recursive depth. 
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This resource abstraction can be viewed as a computer memory in which 
certain finite areas are statically allocated for the different data types. Data on 
stacks represents biomass, in the sense that the complete stack state is a struc- 
tured product of the ongoing computation, just as real biomass is the product of 
life. Similarly, code mirrors genetic material (DNA, RNA, etc.) in that it serves 
as a recipe for building this computational edifice. 

5.3 Cell Division, Migration and Commnnication 

To reproduce, a CEPP genotype must include the repro symbol. When executed, 
repro attempts to copy the genotype, using the CEPP’s resources. If successful, 
a new CEPP is created and given a share of the mother’s resources, where the 
amount is determined by the ratio of the top two elements on the integer stack. 
The mother and daughter then become neighbors in the network/colony. 

The seed CEPP cell begins with resources that constitute the entire pool 
for the CEPP colony; no resources are ever created, only redistributed among 
stacks and CEPPs. This puts a natural restriction on the number of possible cell 
divisions and hence the colony’s size. 

The CEPP colony has a simple topology: each cell has a neighborhood, ini- 
tially composed of its parent cell and all of its own children. However, cell mi- 
gration, via the mig primitive, can quickly alter this arrangement. It causes a 
CEPP, C, to pop the integer stack and use that value as an index (modulo the 
neighborhood size) into its neighbor list. The chosen neighbor, N, and C then 
swap neighborhoods. 

To exchange information, C can push items onto any of its neighbors’ stacks 
by using the pipe operator, which commands the CEPP to pop the integer stack, 
again using the value to select a neighbor, N, which becomes the piped-cepp for 
C. Now, all future operations performed by C will take inputs from C’s stacks 
and will use C’s resources to push outputs onto N’s stacks. Even operations like 
dup will duplicate the top element of a C stack and then push the copy onto 
the corresponding N stack. The cut operator signals a closing of the pipe and a 
return to normal internal processing. Also, the pipar operator assigns the parent 
as the piped-cepp. 

5.4 A Developmental CEPP Example 

Consider the following genotypic code string: 

(cut code quote (real 1.0 - 10 1 repro integer 1-1-0 pipe dup 0.0 

code quote (integer dup * data pipar conv)) 
real dup 3.0 > if 

cut code quote (data pop mig real 2.0 *) 

data append dup length 4 > if 

Skipping the deepest details, a functional overview suffices to illustrate the 
developmental process. The genotype essentially encodes two conditional actions: 
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reproduction and migration. A CEPP can reproduce only if a) A > 3, where X 
is the top element of its real stack, and b) it has the available resources to copy 
its genotype. By pushing 10 and 1 onto the integer stack prior to reproduction, 
the mother insures that she will retain 10/llths of her resources after splitting. 
When the CEPP reproduces, it uses the sequence (0 pipe) to open a pipe to 
its daughter cell. It then copies the code sequence (integer dup * data pipar 
eonv) onto the top of the daughter’s code stack, along with a copy of N, the 
top element of the mother’s integer stack. When the daughter begins running, 
it first executes this sequence, causing it to compute N'^. The pipar command 
then opens a pipe back to the parent, which the daughter uses to copy onto 
the top of the mother’s data stack. 

In short, the mother uses its children as integer-squaring subroutines. How- 
ever, each also receives a copy of the mother’s genome, so after squaring an 
integer, each daughter will push the genome onto its code stack and behave 
similarly to the mother. However, note that the mother pushes a 0.0 onto the 
daughter’s real stack. This helps prevent the daughter’s from satisfying the cri- 
teria for reproduction (A > 3), although the complexities of resource-restricted 
computation and CEPP interactions often lead to the pushing of positive reals 
onto the stacks of many different CEPPS. (Accurately predicing the behavior of 
these colonies is very difficult). In the simulation below, the seed CEPP begins 
with a 6.0 atop its real stack, giving it the opportunity to act as a structural 
organizer of the colony. 

The other main activity, migration, occurs only if the CEPP has a list con- 
taining 5 or more elements (i.e., squared integers) on top of its data stack. The 
list is created by the append operation near the end of the genotype, which 
gradually compresses the elements of the data stack into a single list. 

In a nutshell, this genome was handcrafted to allow the seed CEPP to create 
daughters, use them to produce squared integers, and then migrate to a new 
neighborhood (by swapping places with one of its daughters) for another round 
of reproduction. 

Figure 0shows the basic processor topology generated by a run of this geno- 
type. The seed CEPP, cell 0, generates the network’s central nodes (1, 2, 3, 
12 and 25), which gradually unfold into the horizontal spine as the seed mi- 
grates from left to right. Static analysis of the genotype indicates that the seed 
should reproduce 5 times, thus receiving 5 squared integers, before migrating. 
This clearly occurs in the region of cell 12, where all 6 cells (10, 11, 12, 13, 18 and 
19) are daughters of the seed. However, further to the left, the pattern indicates 
that a) the seed produced only a few offspring before migrating, and b) non-seed 
cells also managed to reproduce (although none migrated). 

That the emergent pattern conflicts with the intended design of the geno- 
type, which was meant to produce several groups of 5 non-reproducing cells, 
indicates the complexity of CEPP-colonies. Two factors confound our predic- 
tions: 1) resource-limited computation, and 2) intercellular stack pushing. 

In resource-limited computation, a series of actions may be only partially 
carried out. For example, the sequence (real dup 0.0 >) is intended to test 
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Fig. 1. CEPP topology generated largely by the reproduction and migration of the 
seed CEPP. Numbers indicate order of birth, while links connect neighboring cells in 
the emerging computational network. 



whether the top real number, X, is positive. After the test, the real stack will 
appear as before, with X on top; and the boolean stack will hold the result of 
the test. However, if the CEPP lacks any boolean resource, then it will duplicate 
the X, push a 0, and then fail to perform the greater-than test. Thus, it fails to 
consume the two new stack items, and the real stack then contains (0, X, X). 

The results of inter-cellular signalling can be even more severe, as whole 
new expressions can be pushed onto neighboring stacks, thus interrupting their 
planned action sequences. Like PUSH, CEPP is designed for use with evolution- 
ary algorithms, so it will be interesting to see in future work whether evolv- 
ing CEPPs include robust genotypes that can handle these unexpected stack 
changes, or whether the main prospects for robustness lie in the distributed 
colony itself. 

6 Discussion 

This paper shows that the developmental process can conceivably map to both 
specific problem solvers and more general computational systems, although nei- 
ther has yet to be proven useful. 

Celluar PUSH programs enable the emergence of a computational colony 
from a single program, thereby providing a developmental basis for problem 
solving. To assess the quality of this metaphor, return to the 5 basic stages of 
development. 
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1. Cleavage division occurs when a CEPP repeatedly copies its genotype, with- 
out performing too many other intermediate computations. This often occurs 
early in a run, when cells tend to have a lot of free resources. 

2. Pattern formation can easily occur in CEPP, since cells are mobile and can 
communicate. In the previous example, the seed CEPP governs patterning 
by producing children and then swapping places with them (i.e., migrating) 
to form the horizontal spine. 

3. Morphogenesis is clearly embodied in CEPP migration. 

4. Differentiation occurs through the execution of both a) a CEPPs own code, 
which may force changes to its state and/or its code, and b) piped code from 
a neighbor. Also, a neighbor can alter behavior by piping atoms, such as the 
0.0 sent by the above seed CEPP to its children to (help) prevent them from 
reproducing. Piping thus enables cell induction. The other two mechanisms 
for differentiation, asymmetric division and timing, are also coverd by CEPP, 
since a) cell splitting need not evenly distribute resources between mother 
and daughter, and b) any stack value can serve as a timing variable. 

5. Growth is somewhat tricky: the concept of size in an ongoing computation 
is difficult to assess. One feasible alternative is the amount of data on the 
stacks, since this is information structured by the computation for the com- 
putation, in much the same way that biomass is structured material for 
supporting life. 

Although the accuracy of the analogy is an interesting academic exercise, the 
important question is whether this type of development can improve problem 
solving. Our ongoing efforts to evolve CEPPs will hopefully shed more light on 
this issue. 

One key factor, redundancy, separates software from the physical realities of 
both biology and hardware. In nature, redundancy accounts for large degrees 
of both homogeneity and heterogeneity at the cellular level. Similarly, physi- 
cal devices, such as circuit boards, are replete with redundant components. In 
software, however, redundancy is preferably avoided, with single subroutines be- 
ing created for common activities. Thus, evolving developmental computation 
may not provide a good piece of working software, but rather a computational 
topology that can later be translated into efficient software by modularization 
of redundant CEPPs. 

In general, the proof is in the pudding, but we are still shopping for ingredi- 
ents. Still, the fascinating capabilities of natural development inspires our further 
investigation, in the hopes that this, like evolution itself, holds a little something 
for everyone. 
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Abstract. Due to the scalability issue in genetic algorithms there is a 
growing interest in adopting development as a genotype-phenotype map- 
ping. This raises a number of questions related to the evolutionary and 
developmental properties of the genotypes in this context. This paper 
introduces the NK-development (NKd) class of tuneable fitness land- 
scapes as a variant of NK landscapes. In a first part the assumptions 
and choices made in defining a simplified model of development genomes 
are discussed. In a second part we present results of the comparison of 
NK and two variants of NKd landscapes. The statistical properties of 
the landscapes are analysed, and the performance of a standard GA on 
the different landscapes is compared. The analysis is aimed at identifying 
the influence of the properties by which the landscapes differ. The re- 
sults and their implications for the design of computational development 
models are discussed. 



1 Introduction 

In the context of the evolution of complex phenotypes, such as electronic circuits 
in Evolvable Hardware (EHW) or the evolution of neural networks, the scalabil- 
ity problem of genetic algorithms (GAs) has kept applications in a toy-problem 
phase. A promising suggestion is to take the analogy with biology one step fur- 
ther and adopt a bio-inspired development mapping between the genotype and 
phenotype [|Kitif»IbK99|HTvR|vRLMU7| . This holds the promise of improved 
modularity of the phenotype, relatively shorter genotypes for complex pheno- 
types, remapping of the search space to improve search performance (e.g. by 
neutrality) etc. An obvious disadvantage is the low degree of evolvability of de- 
velopmental representations: by their very nature, developmental genes depend 
on high epistatic interaction in constructing the phenotype, hence rendering the 
genome less modular and hard to evolve. Nothing restricts engineers of compu- 
tational developmental genomes to construct genomes in such a way that these 
effects can be avoided, or at least minimised. In order to do this, more indicative 
knowledge about the fitness landscapes of developmental genomes is required. 

A useful tool in studying the general properties of fitness landscapes are 
Kauffman’s NK landscapes [KauUSl In this paper we propose NKd landscapes 
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as adapted NK landscapes to model basic principles of biological development. 
The properties of these landscapes will be analysed and compared to standard 
NK landscapes. 

2 Simple Models for Developmental Genomes 

2.1 Basic Modelling Assumptions 

For computationally complex mechanisms such as multi-cellular development, 
only very simple models allow analysis of macroscopic properties such as fitness 
landscape structure. From the biological reality, as well as from known (more 
complex) computational models derived from it, a number of basic assumptions 
can be made. Although contestable in a strictly biological context, they will 
serve as basic assumptions for the landscape models investigated later on. 
Assumption 1: The first assumption is the presence of time in the interpre- 
tation of the genome. This means that the action of reading and interpreting a 
gene is to be associated with a moment in time, rather than with the information 
content of the gene only. This induces an order of interpretation in the genome. 
Assumption 2: The order of interpretation induced by the first assumption is 
crucial. In other words, genes expressed later in time exert their influence on a 
situation which is the result of genes activated earlier on during development. 
Due to the increasing amount of cells during development (due to cell division), 
this implies that ‘early’ genes have a more profound effect on the phenotype 
structure than ‘late’ genes. 

Having derived the basic assumptions for our model, we will discuss these further 
in the context of the NK model for tuneable fitness landscapes. 

2.2 NK Landscapes 

In order to continue the discussion, we will first briefly introduce the basics 
of Kauffman’s NK landscapes IICTI . NK landscapes are a class of tuneable 
fitness landscapes, whose two main parameters are N and K (hence the name 
NK). A genotype consists of N genes, where each gene’s fitness contribution 
is a factor of it’s own allele, and that of K others in the genome. These K 
influencing loci can be chosen according to the nearest neighbours scheme (A/2 
left and right of the focal gene) or the random neighbours scheme. For the first 
the genome has periodic boundaries, for the latter it doesn’t. The values of the 
fitness contributions of a gene is randomly assigned and fixed (between 0 and 
1) for every combination of its value in the context of all the possible 2^ allele 
combinations of its influencing genes. The fitness of the total genome is the 
average fitness contribution of its genes. For details about fitness calculations 
and examples, we refer to the work of K a,u ffm a,n |K a,ii 9 ^ or Altenberg|Alt97). 
The parameter K tunes the ruggedness (or multi-modality) of the fitness land- 
scape. In general this parameter is the main influence of the landscape’s prop- 
erties, while the neighbour scheme tends to be of minor importance. 
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The parameter K and the scheme used for choosing which loci provide the epi- 
static inputs per gene (nearest neighbours or random neighbours), decides on a 
combination of incoming and outgoing epistatic connections per gene. For NK 
landscapes, each gene has K incoming connections, and on average K outgo- 
ing connections. For future reference we define I{i) and 0{i) as the functions 
designating the number of incoming and outgoing connections of gene i. 

2.3 Development in NK Landscapes: NKd 

The discussion will now focus on the implementation of the two assumptions 
mentioned above: 

Implementation of Assumption 1 (order of interpretation due to time) : We 
will arbitrarily consider this order to be from left to right in the genome. Genes 
on the left of the genome will now be considered ‘early’, while genes on the right 
are ‘late’. 

Implementation of Assumption 2 (increasing influence of earlier genes): This 
can be implemented in two ways: 

a. by keeping K constant for all genes, but giving early genes more epistatic 
outputs (or pleiotropy) then late genes 

b. by varying K along the genome, so that early genes have low K and late 
genes have higher K. 

NKd (PI) landscapes are inspired by option (a) above. The principles of 
this NK landscape variant was introduced by Kauffman in the context of a 
phylogenic model for studying generative entrenchment [ Ka,ii93J . In that context 
the model was called the ‘hierarchical NK model’. In the context of development 
and the alternative model proposed below, we will adopt the term ‘NKd(Pl)’ 
here, indicating a NK-development landscape, where the pleiotropy of loci follows 
a designated non-uniform distribution, while the K value is constant. 

NKd(K) landscapes are based on option (b) above. NKd(K) landscapes 
model the fact that early loci are hardly affected by any others, while late loci 
are affected by (almost) all previous loci in time. Exactly which loci affect a 
focal gene will be defined by the neighbour schemes used. More details on the 
neigbouring schemes will be discussed in section Q 

Summarising, both NKd variants are based on introducing an unbalance in 
the epistatic interactions of genes, based on their position on the genome. This 
is done in such a way that ‘early’ genes influence more other genes then ‘late’ 
genes. In NKd(Pl) landscapes, this unbalance is introduced by dictating the 
distribution of outgoing interactions, while NKd(K) landscapes are based on 
dictating the distribution of incoming interactions. 

3 Implementation Details of NKd Landscapes 

We will now take a closer look at the details of the NK and NKd landscapes 
that will be analysed in the next section. Experiments were conducted on NK, 
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NKd(Pl) and NKd(K) landscapes. Throughout all experiments N was fixed at 
32. For the NK landscape experiments we can be brief: standard NK landscapes 
with K loci chosen as nearest or random neighbours for the interactions were 
used. We will now focus the discussion on the NKd landscapes used in the 
experiments. 

3.1 NKd(K) 

An NKd(K) landscape is specified by its I{i) distribution (indicating K for every 
gene), and a scheme for choosing the K inputs per gene. NKd(K) landscapes can 
be compared using the average K value which results from the I(i) distribution. 

The different types of NKd(K) landscapes examined ranged over I{i) dis- 
tributions exponential increasing (model distribution: I{i) = linear in- 

creasing (model distribution I{i) = i) and root increasing (model distribu- 
tion I{i) = \/i). The distributions are then scaled to obtain average K values 
of {0,1,2,3,4,8,15}. In deciding which genes influence which, four neighbour 
schemes were tested: 

1. random neighbours: I(i) influencing genes are chosen at random for gene i. 

2. nearest neighbours: loci are chosen left and right of gene i. The genome 

has periodic boundaries. 

3. left nearest neighbours: I{i) neighbours are chosen in an adjacent complete 
block to the left of gene i. The genome has periodic boundaries. 

4. development neighbours: In this scheme the influencing genes are determin- 
istically chosen as far left (or ‘early’) on the genome as possible, using up 
the maximum of iV — 1 outgoing influences of the genes. Assignment of the 
incoming influences is done in a left-to-right manner on the genome. 

3.2 NKd(Pl) 

An NKd(Pl) landscape is specified by its value of K (I{i) = K for all genes), 
and the shape of the pleiotropy- function 0{i). The different types of NKd(Pl) 
landscapes examined ranged over K values in {0,1,2,3,4,8,15}. The shapes 
of 0{i) adopted in the tests were exponential decreasing (model distribution: 
0{i) = e“°’^*)), linear decreasing (model distribution 0{i) = -i-\- 31) and root 
decreasing (model distribution 0{i) = y/31 — i). These basic shapes were scaled 
in order to obtain a correct match between the total amount of pleiotropic (sum 
of the values dictated by the distribution) and epistatic interactions {N x K). 
A second requirement is that when scaling a distribution, the maximum value 
allowed for any gene’s pleiotropy is iV — 1. In deciding which genes influence 
which, two neighbour schemes were tested (nearest and leftnearest schemes are 
incompatible with imposing a 0{i) distribution): 

1. random neighbours: I{i) influencing genes are chosen at random for gene i, 
while obeying the maximum pleiotropy dictated by the 0{i) distribution of 
the influencing genes. 

2. development neighbours: Similar as described above, but now the maximum 
pleiotropy of a gene is dictated by the 0{i) distribution. 
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4 Analysis of NKd Landscapes 

4.1 Overview of Experiments 

The three types of landscapes within the scope of this paper differ by tree pa- 
rameters: 

1. global landscape type (NK, NKd(Pl) and NKd(K)) 

2. neighbour or interaction schemes (random, nearest, left nearest and devel- 
opment) 

3. interaction distributions (constant, linear, exponential and root shaped) 

The goal of the experiments reported is to gain knowledge of the influence of 
these three global parameters on both the statistical properties of the landscapes, 
and the performance of a standard GA on them. The statistical properties of 
the landscapes were extracted from the data gained from two types of walks on 
the landscape: 

1. greedy walks: Starting from a random starting genotype, continuously move 
to a random fitter neighbour at Hamming distance 1, until a local optimum 
is attained. The goal of the experiment is threefold: first of all, statistical 
information is gathered of fundamental properties of the landscape such as 
peaks and fitness values. Second, knowledge can be gained of the overall 
organisation of the landscape such as relative positions of the optima, basins 
sizes etc. In the experiments 100 greedy walks were performed on 100 random 
instances of every landscape type and variant. 

2. random walks: Starting from a random genotype, continuously move to a 
random neighbour at Hamming distance 1, until a predefined number of 
steps is taken. The goal of this experiment is to measure the correlation 
length of the landscape [MdWSQllWeiOOJ . In the experiments 50 random 
walks of 1000 steps were performed on 50 random version of landscapes of 
each type. 

The evolutionary experiments on the landscapes consisted of running a stan- 
dard genetic algorithm over a fixed amount of generations and observing the 
results. The population size was fixed at 30, letting the GA run for 100 genera- 
tions. Bit-mutation probability was 3% per bit and one-point crossover probabil- 
ity 30%. Roulette-wheel selection was used without elitism. For each landscape 
type, 30 GA runs were performed. 



4.2 Classification of Obtained Information 

From all these experiments the following data was obtained: 

1. Mean walk length to a loeal optimum {WL)\ Rugged landscapes tend to 
have a local optimum nearby, while smoother landscapes allow longer walks 
of continuous improvement. 
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2 . 



3. 



4. 



5. 



Mean and maximum fitness of optima encountered (F and Max F): This 
data will be used to evaluate the performance of the GA. The mean fitness 
of local optima in highly rugged NK landscapes dwindles to 0.5 as a result 
of the ‘complexity catastrophe’ 

Massif Central effect (MC) : This refers to the effect that high peaks in the 
landscape tend to be close to each other, forming a ‘Massif Central’. This 
effect is favourable, since it indicates that in the surroundings of a high opti- 
mum, other high optima can be found. This effect can be observed from the 
relation between the distance of encountered optima to the highest optimum 
and their relative fitness fitness. If the effect is present, the distribution of 
optima according to their distance to the global optimum will be sloped as 
a result of the unbalance in fitness. The effect can then be numerically ob- 
served from the slope of a linear fit to the dataset. The steeper the slope 
of the regression curve, the more prominent the effect. Figure |I(a) [ gives an 
example of a fitted plot that exhibits the ‘Massif Central’ effect. 

Unequal Distribution of basin sizes of optima (UBS): The optima of smoother 
NK landscapes show a tendency to have large basins. This can be observed 
from a sloped distribution of optima fitness scores vs. the number of times 
the optimum is reached during the greedy walks. This effect is favourable, 
since optima with large basins of attraction are reached more easily by an 
optimisation algorithm. An indication of the trend of the effect can again be 
obtained by fitting the data and observing the slope of the regression curve. 
Figure |l(b)| shows an example. On the plot, fitness values relative to the 
fitness of the best optimum found per landscape (vertical axis) are plotted 
against the number of times the optimum was reached during the greedy 
walks. 

Correlation length of the landscape (CorrL) [ IMdWShl M^ei9n| : The correla- 
tion length of a fitness landscape is the distance h from a genotype where the 
autocorrelation function p{h) = 1/2. The autocorrelation function is defined 
as: 



p{h) = ^R{h) 



R{h) 

m 



E[{Fh - PF)iFo - /if)] 
E[{Fq — fJ.F){Eo — fip)] 



where cr^ is the variance and R(h) the auto-covariance of the stationary 
stochastic process. E[x] is the expected value or mean, Fh the fitness at dis- 
tance h from the observation point, and pp the mean fitness. The correlation 
length is an overall measure of ruggedness of the landscape. 

6. Relative fitness reached by a standard CA (Evol. F and Ratio): The highest 
fitness obtained from performing a series of GA runs on a landscape was 
compared to the highest fitness ever encountered on the same type of land- 
scape (but other random instances) during the greedy walks. This value can 
be used to compare the fitness that can be achieved by a GA in comparison 
to the (limited) knowledge available of the fitness achievable on that type of 
landscape. 

7. Number of evaluations to reach maximum fitness by a standard CA(nrEvals): 
The number of evaluations needed to reach the maximum score in a GA 
runs was recorded and the results averaged. This can give an indication of 
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NK.N32K1. random. massif 




(a) An illustration of the ‘Massif Cen- 
tral’ effect 

Fig. 



NK.N32K1. random. basin 




(b) An illustration of unequal basin 
size distribution 

1 . 



the number of generations a GA can increase the maximum fitness before 
converging. This can then be compared to the quality of the optimum (see 
previous item) and the overall properties of the landscape obtained from the 
walks. 



4.3 Results and Observations 

The experiments conducted in the scope of this paper generate huge amounts of 
data, that simply cannot be listed in a single paper. We will limit the discussion 
to a comparing low {K = 1), medium {K = 3) and higher {K = 8) epistatic 
landscapes. The upper limit of K = 8 also allows comparison of all variants of 
landscapes, since for example it is impossible to scale an exponential K distribu- 
tion in such a way to obey all restrictions discussed before. In order to discuss 
the influence of landscape type, interaction distribution and neighbour scheme, 
we will list a selected number of results from landscapes that differ only by the 
property under investigation. 

The results for the ‘Massif Central’ (MC) effect and the unequal basin size 
distributions (UBS) will be summarised by the slope a of the linear fit to the 
data defined as: linfit{x) = ax + b. See Figure [Qfor an illustration. In contrast 
to the results for MC, the fitting slope of the UBS is only indicative, and not 
intended for detailed comparisons, due to the natural trend of the data. The 
x-range over which the data for the UBS measurements is defined, differs with 
the ruggedness of the landscape (the smoother the landscape, the less optima, 
and the broader the a:-range). For this reason, the data for UBS was a;x-scaled to 
a range of 100, before the fitting slope was calculated. If the effect was virtually 
non-existent from the experiments performed, the result was marked between 
brackets. The results of the experiments are shown in Tables n Eland 0 

4.4 Comparing Landscape Types 

In order to make a basic comparison between NK, NKd(Pl) and NKd(K) land- 
scapes, we select the random interaction scheme for all landscape types and 
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Table 1. Results of comparison of landscape types. 



Landscape 


K 


I-Scheme 


I-Distr. 


WL 


F 


UBS (xlO-’*) 


MC (xl0“’*) 


CorrL 


NK 


1 


random 


- 


15.9 (3.5) 


0.696 (0.035) 


1.10 


-5.08 


8.32 (2.01) 




3 


random 


- 


14.9 (4.2) 


0.712 (0.032) 


0.861 


-3.83 


5.14 (0.91) 




8 


random 


- 


10.4 (3.6) 


0.699 (0.029) 


(0) 


-2.28 


2.21 (0.26) 


NKd(Pl) 


1 


random 


linear 


15.3 (3.5) 


0.692 (0.038) 


0.689 


-4.53 


8.21 (1.85) 




3 


random 


linear 


14.9 (4.2) 


0.712 (0.031) 


0.750 


-3.46 


5.33 (0.80) 




8 


random 


linear 


10.2 (3.5) 


0.700 (0.030) 


(0) 


-2.46 


2.22 (0.27) 


NKd(K) 


1 


random 


linear 


15.7 (3.6) 


0.686 (0.038) 


0.933 


-4.52 


8.26 (1.86) 




3 


random 


linear 


14.3 (4.5) 


0.701 (0.036) 


1.41 


-4.41 


5.01 (1.05) 




8 


random 


linear 


9.3 (3.6) 


0.678 (0.034) 


(0) 


-3.27 


2.29 (0.35) 



Landscape 


K 


I-Scheme 


I-Distr. 


Max F 


Evol. F 


Ratio 


nrEvals 


NK 


1 


random 


- 


0.788 


0.752 (0.014) 


0.952 


1558.8 (510.9) 




3 


random 


- 


0.799 


0.746 (0.031) 


0.933 


1702.8 (676.8) 




8 


random 


- 


0.795 


0.735 (0.016) 


0.919 


1386 (560.1) 


NKd(Pl) 


1 


random 


linear 


0.813 


0.719 (0.004) 


0.884 


1455 (567) 




3 


random 


linear 


0.802 


0.731 (0.017) 


0.911 


1254 (567) 




8 


random 


linear 


0.805 


0.727 (0.019) 


0.903 


996 (282) 


NKd(K) 


1 


random 


linear 


0.799 


0.732 (0.009) 


0.916 


1381.2 (451.8) 




3 


random 


linear 


0.823 


0.745 (0.015) 


0.905 


1531.2 (649.2) 




8 


random 


linear 


0.811 


0.731 (0.025) 


0.901 


1124.4 (483) 



OBSERVATIONS (from Table DJ: 

101 Generally the results for all landscapes are comparable. This implies that keeping K 
constant is equivalent to using a skewed distribution of K that amount to the same 
average K (this describes the difference between NK and NKd(K). Also, keeping 
K fixed for all genes, but enforcing an skewed distribution of pleiotropy values 
per gene is of no visible influence on the landscape (this describes the difference 
between NK and NKd(Pl)). 

□2 The NKd(K) landscape shows a slightly more prominent ‘Massif Central’ effect for 
higher (>1) K. 



linear the interaction distribution for both NKd landscapes. The results for K 
= 1,3 and 8 are shown in Table Q 

4.5 Comparing Interaction Schemes 

In order to make a basic comparison between the influence of the different inter- 
action schemes, we compare the different landscapes with identical settings, only 
changing the interaction scheme. We choose the linear interaction distribution 
for both NKd landscapes. The ‘random’ and ‘development’ interaction scheme 
were compared, since these are the only two available for the NKd(Pl) landscape. 
The results for K = 1,3 and 8 are shown in Table El 



4.6 Comparing Interaction Distributions 

In order to make a basic comparison between the influence of the different inter- 
action distributions, we compare the different landscapes with identical settings. 



Developmental Effects on Tuneable Fitness Landscapes 125 



Table 2. Results of comparison of interaction schemes. 



Landscape 


K 


I-Scheme 


I-Distr. 


WL 


F 


UBS (xlO-’*) 


MC (xl0“‘*) 


CorrL 


NKd(Pl) 


1 


random 


linear 


15.3 (3.5) 


0.692 (0.038) 


0.689 


-4.53 


8.21 (1.85) 




3 


random 


linear 


14.9 (4.2) 


0.712 (0.031) 


0.750 


-3.46 


5.33 (0.80) 




8 


random 


linear 


10.2 (3.5) 


0.700 (0.030) 


(0) 


-2.46 


2.22 (0.27) 




1 


dev 


linear 


15.9 (3.6) 


0.698 (0.040) 


0.592 


-4.11 


8.38 (1.78) 




3 


dev 


linear 


14.3 (4.1) 


0.703 (0.033) 


0.798 


-3.60 


5.45 (1.15) 




8 


dev 


linear 


9.5 (3.4) 


0.675 (0.029) 


(0) 


-2.32 


2.25 (0.31) 


NKd(K) 


1 


random 


linear 


15.7 (3.6) 


0.686 (0.038) 


0.933 


-4.52 


8.26 (1.86) 




3 


random 


linear 


14.3 (4.5) 


0.701 (0.036) 


1.41 


-4.41 


5.01 (1.05) 




8 


random 


linear 


9.3 (3.6) 


0.678 (0.034) 


(0) 


-3.27 


2.29 (0.35) 




1 


dev 


linear 


15.7 (3.2) 


0.667 (0.046) 


1.36 


-3.36 


7.98 (1.80) 




3 


dev 


linear 


15 (3.8) 


0.667 (0.039) 


1.85 


-4.81 


4.91 (1.13) 




8 


dev 


linear 


11.8 (3.8) 


0.648 (0.037) 


(0.967) 


-3.73 


2.26 (0.42) 



Landscape 


K 


I-Scheme 


I-Distr. 


Max F 


Evol. F 


Ratio 


nrEvals 


NKd(Pl) 


1 


random 


linear 


0.813 


0.719 (0.004) 


0.884 


1455 (567) 




3 


random 


linear 


0.802 


0.731 (0.017) 


0.911 


1254 (567) 




8 


random 


linear 


0.805 


0.727 (0.019) 


0.903 


996 (282) 




1 


dev 


linear 


0.803 


0.740 (0.007) 


0.922 


1401 (366) 




3 


dev 


linear 


0.823 


0.719 (0.010) 


0.874 


1590 (789) 




8 


dev 


linear 


0.783 


0.716 (0.021) 


0.914 


1419 (669) 


NKd(K) 


1 


random 


linear 


0.799 


0.732 (0.009) 


0.916 


1381.2 (451.8) 




3 


random 


linear 


0.823 


0.745 (0.015) 


0.905 


1531.2 (649.2) 




8 


random 


linear 


0.811 


0.731 (0.025) 


0.901 


1124.4 (483) 




1 


dev 


linear 


0.764 


0.632 (0.007) 


0.827 


1302 (309) 




3 


dev 


linear 


0.774 


0.675 (0.020) 


0.872 


1341 (336) 




8 


dev 


linear 


0.773 


0.700 (0.024) 


0.906 


1461 (516) 



OBSERVATIONS (from Table 0: 

01 In neither of the two tested NKd landscapes does the changing of the interaction 
scheme provide any substantial change in the landscape properties. 

The NKd(K) landscape with ‘development’ interaction scheme and ‘linear’ distri- 
bution shows slightly longer average walk lengths to an optimum and relatively 
strong ‘Massif Central’ effect for higher (>1) K. The effect for A = 1 is less 
prominent then for the other landscapes examined. 



only changing the interaction distribution. We choose the random interaction 
scheme for all landscapes. The ‘exponential’ and ‘square root’ interaction scheme 
were compared, since their shape differs the most. The results for K = 1,3 and 
8 are shown in Table 0 

5 Discussion 

General Experimental Conclusions: The general conclusion from the nu- 
merous experiments in the scope of this paper is that enforcing an skewed distri- 
bution on the incoming or outgoing epistatic interactions on NK-like landscapes 
is of hardly any influence to the overall landscape properties, and the perfor- 
mance of a standard GA on them. 
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Table 3. Results of comparison of interaction distributions. 



Landscape 


K 


I-Scheme 


I-Distr. 


WL 


F 


UBS (xlO-’*) 


MC (xl0“‘*) 


CorrL 


NKd(Pl) 


1 


random 


exp 


15.9 (3.6) 


0.693 (0.040) 


1.125 


-4.51 


8.71 (1.95) 




3 


random 


exp 


15.0 (4.2) 


0.714 (0.033) 


1.345 


-4.21 


5.22 (0.97) 




8 


random 


exp 


10.5 (3.6) 


0.700 (0.030) 


(0.734) 


-2.34 


2.24 (0.26) 




1 


random 


sqrt 


15.8 (3.5) 


0.692 (0.031) 


0.867 


-4.44 


8.11 (1.63) 




3 


random 


sqrt 


14.7 (4.3) 


0.712(0.032) 


0.944 


-3.52 


5.02 (0.94) 




8 


random 


sqrt 


10.3 (3.7) 


0.699 (0.030) 


(1.754) 


-2.75 


2.41 (0.29) 


NKd(K) 


1 


random 


exp 


15.7 (3.7) 


0.692 (0.035) 


0.934 


-4.29 


8.08 (1.55) 




3 


random 


exp 


14.2 (4.4) 


0.694 (0.038) 


1.372 


-4.84 


4.92 (0.97) 




8 


random 


exp 


9.0 (3.7) 


0.671 (0.038) 


(0) 


-3.94 


2.57 (0.49) 




1 


random 


sqrt 


15.8 (3.6) 


0.693 (0.030) 


0.821 


-4.21 


8.12 (1.72) 




3 


random 


sqrt 


14.3 (4.1) 


0.708 (0.031) 


1.176 


-3.55 


5.03 (0.98) 




8 


random 


sqrt 


10.0 (2.6) 


0.691 (0.033) 


(0) 


-2.83 


2.32 (0.32) 



Landscape 


K 


I-Scheme 


I-Distr. 


Max F 


Evol. F 


Ratio 


nrEvals 


NKd(Pl) 


1 


random 


exp 


0.781 


0.631 (0.013) 


0.808 


1479 (531) 




3 


random 


exp 


0.813 


0.734 (0.011) 


0.903 


1254 (513) 




8 


random 


exp 


0.792 


0.732 (0.026) 


0.924 


1368 (588) 




1 


random 


sqrt 


0.783 


0.734 (0.009) 


0.937 


1548 (531) 




3 


random 


sqrt 


0.813 


0.745 (0.027) 


0.916 


1728 (717) 




8 


random 


sqrt 


0.793 


0.726 (0.022) 


0.916 


1503 (663) 


NKd(K) 


1 


random 


exp 


0.782 


0.776 (0.017) 


0.992 


1494 (498) 




3 


random 


exp 


0.809 


0.731 (0.014) 


0.904 


1314 (447) 




8 


random 


exp 


0.786 


0.721 (0.015) 


0.917 


1308 (762) 




1 


random 


sqrt 


0.767 


0.709 (0.008) 


0.924 


1299 (327) 




3 


random 


sqrt 


0.800 


0.736 (0.021) 


0.920 


1407 (597) 




8 


random 


sqrt 


0.826 


0.736 (0.027) 


0.891 


1206 (636) 



OBSERVATIONS (from Table 0: 

01 Changing the interaction curve in any of the two NKd landscapes does not cause 
a considerable change in the landscape properties. 

02 The NKd(K) landscape with ‘random’ interaction scheme and ‘exponential’ in- 
teraction distribution shows a relatively strong ‘Massif Central’ effect for higher 
(>1) K. 



Keeping the above in mind as a general conclusion, in some cases the NKd(K) 
landscape showed to have somewhat deviating properties from NK and NKd (PI) 
landscapes. In a few cases the ‘Massif Central’ effect was slightly more prominent, 
and the mean distance to a local optimum was slightly higher then for other 
landscapes with comparable settings. Bear in mind that NKd(K) landscapes 
are the only landscape variant where the value of K per bit is not constant. 
Apart from that, the maximum pleiotropy per gene is not limited by an induced 
distribution (as in NKd(Pl)), but mere limited to N—1. This leads us to conclude 
that forcing a part of the genotype to lower K values then the average, renders 
the landscape somewhat smoother and more structured, while doing the same 
for the pleiotropy values of genes showed to have no effect. 
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Relation to Development in EC: Interpreting the results in a broader per- 
spective we can conclude that the properties of fitness landscapes that were in- 
vestigated depend on the average value of K. They generally do not depend on 
the way the values of K per gene are distributed on the genotype (the interaction 
distribution from the experiments), neither on the way the K influencing genes 
are chosen from the N — 1 possible genes. For NK landscapes this conclusion has 
already been drawn Kauffman [ Kau9dl . 

Computational models of development mappings between genotype and phe- 
notype often consist of rule- type genes. In order to develop a given genotype 
(which is a collection of rules), the rules are sequentially applied to a starting 
structure (equivalent to a zygote) to obtain a final phenotype (for an example 
see [ lvK,l;IVIl)^ L In this context our results imply that: 



1. (NKd(Pl) = NK) If the fitness contribution of a rule-gene depends on the 
activation of a constant amount of other rule-genes, then the position on the 
genotype of the influencing rule-genes is of no influence on the evolutionary 
properties of the genotype. In the scope of the definition of our model this 
actually means that from an evolutionary point of view it is of no influence 
whether the influencing rule-genes have already been activated, or still have 
to be activated in the future. 

2. (NKd(K) = NK) A genotype in which the fitness contributions of a rule-gene 
depend on a skewed distribution of on average K other rule-genes, then this 
genotype is evolutionary equivalent to a genotype in which each rule-gene 
depends on exactly K others. 



Of course, these conclusions depend on the correctness of the proposed model. 
We conclude the discussion with some critical remarks to the proposed model, 
which will be the basis of future work: First of all, the fitness contributions of 
all genes are between 0 and 1, which is based on the NK model. In a develop- 
mental model this is unlikely to be the case, which suggests extending the model 
to genes with different maximum fitness contributions. Secondly, although it is 
interesting to use a model where it is possible to reason based on the fitness 
contribution of individual genes as a starting point, this is much harder in a 
computational development model, since the phenotype is built using a complex 
system of interacting rules. Hence numerically valuing the fitness contribution 
becomes virtually impossible. And as a third and final remark, it is an open 
question under which conditions the dependencies between genes follow a cer- 
tain type of distribution (exponential, linear, . . . ). Possibly there is a link with 
phenotype modularity, but this is still under investigation and at this point 
highly speculative. 
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Abstract. It is clear to all, after a moments thought, that nature has much we 
might be inspired by when designing our systems, for example: robustness, 
adaptability and complexity, to name a few. The implementation of bio-inspired 
systems in hardware has however been limited, and more often than not been 
more a matter of artistry than engineering. The reasons for this are many, but 
one of the main problems has always been the lack of a universal platform, and 
of a proper methodology for the implementation of such systems. The ideas 
presented in this paper are early results of a new research project, "Reconfigur- 
able POEtic Tissue". The goal of the project is the development of a hardware 
platform capable of implementing systems inspired by all the three major axes 
(phylogenesis, ontogenesis, and epigenesis) of bio-inspiration, in digital hard- 
ware. 



1 Introduction 

The implementation of bio-inspired systems in silicon is quite difficult, due to the 
sheer number and complexity of the biological mechanisms involved. Conventional 
approaches exploit a very limited set of biologically-plausible mechanisms to solve a 
given problem, but often cannot be generalized because of the lack of a methodology 
in the design of bio-inspired computing machines. This lack is due to the heterogene- 
ity of the hardware solutions adopted for bio-inspired systems, which is itself due to 
the lack of architectures capable of implementing a wide range of bio-inspired 
mechanisms. 

The aim of this paper is to present some of the preliminary ideas developed in the 
framework of a new research project, called "Reconfigurable POEtic Tissue" (or 
"POEtic” for short), recently started under the aegis of the European Community. 
After a short introduction to the POE model for the design of bio-inspired hardware, 
the paper will present an outline of the POEtic project and of the main ideas behind 
the development of the tissue, to finish with some of the preliminary results in the 
form of an overall architecture designed for the implementation of bio-inspired sys- 
tems. 
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2 The POE Model 

The goal of the POEtic project is the 

development of a flexible computational substrate inspired by the evolution- 
ary, developmental and learning phases in biological systems.” 

Biological inspiration in the design of computing machines finds its source in es- 
sentially three biological models [1]: phylogenesis (P), the history of the evolution of 
the species, ontogenesis (O), the development of an individual as orchestrated by his 
genetic code, and epigenesis (E), the development of an individual through learning 
processes (nervous system, immune system) influenced both by the genetic code (the 
innate) and by the environment (the acquired). These three models share a common 
basis: the genome. 

What follows is a very concise and simplified description of the three models and 
of their application to the conception of electronic hardware. A much more detailed 
analysis of the three models and of existing hardware implementations, in the form of 
internal project reports, is available online on the POEtic project website [2]. 



2.1 Phylogenesis 

On the phylogenetic axis we find systems inspired by the processes involved in the 
evolution of a species through time, i.e. the evolution of the genome. The process of 
evolution is based on alterations to the genetic information of a species, occurring 
through three basic mechanisms: selective reproduction, crossover and mutation. 

These mechanisms are, by nature, non-deterministic. This represents both their 
strength and their weakness, when applied to the world of electronics. It is a strength, 
because they are fundamentally different from traditional algorithms and thus are 
potentially capable of solving problems which are intractable by deterministic ap- 
proaches. It is a weakness, because computers are inherently deterministic (it is very 
difficult, for example, to generate a truly random number, a basic requirement for 
non-deterministic computation, in a computer). 

Even with this disadvantage, algorithms that exploit phylogenetic mechanisms are 
carving themselves a niche in the world of computing. These algorithms, commonly 
referred to as evolutionary algorithms (a label that regroups domains such as genetic 
algorithms [3], evolutionary programming [4], and genetic programming [5]), are 
usually applied to problems which are either too ill-defined or intractable by determi- 
nistic approaches, and whose solutions can be represented as a finite string of symbols 
(which thus becomes the equivalent of the biological genome). An initial, random 
population of individuals (i.e., of genomes), each representing a possible solution to 
the problem, is iteratively “evolved” through the application of mutation and cross- 
over. The resulting sequences are then evaluated on the basis of their efficiency in 
solving the given problem (fitness function) and the best solutions are in turn evolved. 
This approach, is not guaranteed to find the best possible solution to a given problem, 
but can often find an “acceptable” solution more efficiently than deterministic ap- 
proaches. 

It appears, then, that the phylogenetic axis has already provided a considerable 
amount of inspiration to the development of computer systems. To date, however, its 
impact has been felt mostly in the development of software algorithms, and only mar- 
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ginally in the conception of digital hardware. Koza et al. pioneered the attempt to 
apply evolutionary strategies to the synthesis of electronic circuits when they applied 
genetic algorithms to the evolution of a three-variable multiplexer and of a two-bit 
adder [5]. Also, evolutionary strategies have been applied to the development of the 
control circuits for autonomous robots [6], [7]. Other research groups are active in this 
domain [8], [9] however, technical issues have posed severe obstacles to the devel- 
opment of evolvable electronic hardware [10]. A possible solution for the POEtic 
project is given in [11]. 



2.2 Ontogenesis 

The ontogenetic model concerns the development of a single multi-cellular biological 
organism. This process exploits a set of very specific mechanisms to define the 
growth of the organism, i.e. its development from a single mother cell (the zygote) to 
the adult phase. The zygote divides, each offspring containing a copy of the genome 
(cellular division). This process continues (each new cell divides, creating new off- 
spring, and so on), and each newly formed cell acquires a functionality (e.g., liver 
cell, or epidermal cell) depending on its surroundings, i.e., its position in relation to 
its neighbors (cellular differentiation). 

Cellular division is therefore a key mechanism in the growth of multi-cellular 
organisms, impressive examples of massively parallel systems: the 6xl0'^ cells of a 
human body, each a relatively simple elements, work in parallel to accomplish ex- 
tremely complex tasks (the most outstanding being, of course, intelligence). If we 
consider the difficulty of programming massively parallel computers, biological inspi- 
ration could provide some relevant insights on how to handle massive parallelism in 
silicon. 

A fundamental feature of biological organisms is that each cell contains the blue- 
print for the entire organism (the genome), and thus can potentially replace any other 
cell [12]: no single cell is indispensable to the organism. In fact, cells are ceaselessly 
being created and destroyed in an organism, a mechanism at the base of one of the 
most interesting properties of multi-cellular organisms: healing. 

The ontogenetic axis has been almost completely ignored by computer scientists, 
despite a promising start in the fifties with the work of John von Neumann, who de- 
veloped a theoretical model of a universal constructor, a machine capable of con- 
structing any other machine, given its description [13]. Given a description of itself, 
the universal constructor can then self-replicate, a process not unlike cellular division. 

Unfortunately, electronic circuits in the 1950s were too primitive to allow von 
Neumann's machine to be realized, and the concept of self-replicating machines was 
thus set aside for a long time. Probably the main obstacle to the development of self- 
replicating machines was the impossibility of physically creating self-replicating 
hardware. In fact, such a process requires a means to transforming information (i.e. 
the description of a machine) into hardware, and such a means was unpractical until 
the introduction of programmable logic circuits (FPGAs), which provide a partial 
solution to the problem. 

The difficulty of implementing processes such as growth in silicon has limited re- 
search on this model. The Logic Systems Laboratory at EPFL began to investigate 
ontogenesis of electronic circuits with its Embryonics project [14]. Another aspect of 
ontogenesis, morphogenesis (that is, the generation of the shape of an organism) has 
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been the subject of some research [15], [16], and recently the issue of development is 
attracting growing attention as a source of inspiration for artificial systems [17]. More 
details for this project can be found in [18]. 



2.3 Epigenesis 

The human genome contains approximately 3x10’ bases. An adult human body con- 
tains around 6xl0‘^ cells, of which approximately 10‘° are neurons, with lO'’ connec- 
tions. In this case, and in many complex organisms, the genome cannot contain 
enough information to completely describe all the cells and synaptic connections of 
an adult organism. 

There must therefore exist a process that allows the organism to increase in com- 
plexity as it develops. This process can be labeled epigenesis, and includes the devel- 
opment of the nervous, immune, and endocrine systems. 

Epigenetic mechanisms have already had considerable impact on computer sci- 
ence, and particularly on software design, notably through the concept of learning. 
The parallel between a computer and a human brain dates to the very earliest days of 
the development of computing machines, and led to the development of the related 
fields of artificial intelligence [19] and artificial neural networks. 

All living organisms interact with their environment and are capable of responding 
to sensory inputs. In many cases this behaviour is learnt over a period of time, after 
which a specific stimulus will trigger the same response each time, sometimes the 
response may depend on the context of the stimulus. It is now recognised that such 
behaviour is controlled by neurones or nerve cells and their interactions. The majority 
of neurones underlying learning are spiking neurones - they communicate through 
neuronal spike trains. In this project we will focus on the use of spiking (or pulsed) 
neuronal models to implement learning [20]. These devices are based on leaky inte- 
grate-to-threshold and fire neurone models, we will adapt these for implementation in 
hardware. Learning in spiking neural networks usually takes inspiration from Hebb’s 
ideas, some recent work has suggested that learning in spiking neural networks may 
be possible using very coarse grained synaptic weightings [21], which are particularly 
suited to hardware implementation. A view on the way forward for the POEtic project 
is given in [22]. 

A more recent addition in the general area of hardware systems and epigenetic 
processes are artificial immune systems [23]. Here the sophisticated mechanisms 
associated with “fault tolerance” in nature have been adapted for electronic hardware 
system designs. 



3 The POEtic Project 

While each of the three models, taken separately, has to a greater or lesser extent been 
used as a source of inspiration for the development of computing machines, their 
amalgamation into a single system is a challenge yet to be met. The aim of the POEtic 
project [2] is the development of a computational substrate optimized for the imple- 
mentation of digital systems inspired by all of the three above-mentioned models. 

The POEtic tissue is a multi-cellular, self-contained, flexible, and physical sub- 
strate designed to interact with the environment, to develop and dynamically adapt its 
functionality through a process of evolution, growth, and learning to a dynamic and 
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partially unpredictable environment, and to self-repair parts damaged by aging or 
environmental factors in order to remain viable and perform similar functionalities. 

Following the three models of bio-inspiration, the POEtic tissue will be designed 
as a three-layer structure (figure 1): 

• The phylogenetic model acts on the genetic material of a cell. Each cell can contain 
the entire genome of the tissue. Typically, in the architecture defined above, it 
could be used to find and select the genes of the cells for the genotype layer. 

• The ontogenetic model concerns the development of the individual. It should act 
mostly on the mapping or configuration layer of the cell, implementing cellular 
differentiation and growth. In addition, ontogenesis will have an impact on the 
overall architecture of the cells where self-repair (healing) is concerned. 

• The epigenetic model modifies the behavior of the organism during its operation, 
and is therefore hest applied to the phenotype layer. 




EPIGEINESIS 






ONTOGENESIS < 



PHYLOGENESIS<( 



Fig. 1. The three organizational layers of the POEtic project. 



Defining separate layers for each model has a number of advantages, as it allows 
the user to decide whether to implement any or all models for a given problem, and 
lets the structure of each layer to be adapted to the model. This adaptability, which 
will be detailed in the next section, is achieved by implementing the cells on a mo- 
lecular substrate, in practice a surface of programmable logic. 

Beside this combination of the three main axes of biological self-organization, the 
proposed tissue presents two more innovative hardware-oriented aspects: 

• a layered hardware structure that matches the three axes of biological organization; 

• an input/output interface with the external world that allows each cells to perceive 
and modify its environment when and where necessary. 



4 POE Integration 

There are at least two advantages in a circuit that integrates phylogenesis, ontogene- 
sis, and epigenesis. The first is that each of the corresponding algorithms can be opti- 
mised for a particular class of problems. The second advantage comes from the com- 
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bination of the different self-organization modalities. For example, instead of combin- 
ing evolutionary algorithms with off-the-shelf learning algorithms, it will be possible 
to genetically encode and evolve the rules by which the phenotype plane of the tissue 
can dynamically adapt to external or internal perturbations without need of additional 
evolution. It has been shown that in the case of simulated neural networks, by evolv- 
ing the adaptation mechanisms instead of a fixed configuration pattern, the evolution- 
ary circuits cannot capitalize upon genetically determined solutions that fit a particu- 
lar environment, but are rather forced to evolve the ability to develop on the fly the 
structure required by their specific environment [24]. This is quite interesting because 
it potentially offers a solution to the strong dependency of conventional evolved cir- 
cuits to the evolutionary environment. 

4.1 Cellular Structure 

In contrast to conventional cellular electronic devices (such as Field Programmable 
Gate Arrays), each cell in the POEtic tissue contains the entire genome for the whole 
tissue. This solution, which is also closer to biological reality, increases the robust- 
ness of the system, allowing individual cells to reconfigure their function for growth, 
self-replication, and self-repair at a local level. 

The ontogenetic concept of growth has been largely ignored in the past in the de- 
velopment of bio-inspired systems. The reason is obvious: growth implies the "crea- 
tion" of new material during the lifetime of an organism (and therefore during the 
operational life of a machine), a feature which remains impossible to this day in the 
world of electronic circuits. 

However, while the silicon substrate of an electronic circuit cannot be modified, a 
reconfigurable circuit like the POEtic tissue allows its function to be defined after 
fabrication, and thus potentially modified at runtime. The tissue allows the realization 
of a growth process based on information rather than matter. 

The process of growth is based on cellular division and on cellular differentiation. 
From the standpoint of both biological organisms and electronic circuits, these two 
processes are made possible by the existence of the same genome inside each cell. 
Despite their structural similarity, each cell can specialize to perform a unique func- 
tion by "executing" only a specific part of the genetic program depending on the cell's 
location in the organism. This differentiation allows multi-cellular organisms, includ- 
ing the POEtic tissue, to display behaviors of a greater complexity than those of sin- 
gle-cell organisms. 

Einally, biological multi-cellular organisms are capable of surviving an astounding 
amount of damage. The "fault tolerance" of biological organisms is based on a set of 
extremely complex systems ranging from the organism-wide immune system to sin- 
gle-cell error correction. The ontogenetic features of the tissue will include a hierar- 
chy of fault-tolerance mechanisms, acting "behind the scenes" with little or no effect 
on the operation of the machine. 

4.2 Layered Organization 

The structure in figure 1, naturally accommodates the building blocks and processes 
that regulate biological life. The genetic material specifying the functionality of the 
circuit is stored in the genotype layer, the rules and mechanisms of gene expression 
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that map the genotype into a phenotype are stored and executed in the configuration 
layer, and the resulting function performed by each cell is implemented and executed 
in the phenotype layer. 

The explicit distinction in three layers allows the study and implementation of 
complex genotype-to-phenotype mappings that can exploit growth and reconfigura- 
tion properties of the tissue, reduce the genome length, and eventually improve the 
evolvability of the tissue. 

The mapping between genotypes and phenotypes is mainly a one-to-one function 
in artificial evolutionary systems. This relatively straightforward strategy does not 
capture the complex, but highly regulated, mechanisms of gene expression that char- 
acterise the development of biological organisms and barely supports the creation of 
genetic building blocks that could be exploited by crossover operators. At the same 
time, since in one-to-one mappings the length of the genotype is proportional to the 
complexity of the phenotype, large structures such as neuronal circuits are difficult to 
evolve. POEtic tissue will provide a substrate that naturally supports the implementa- 
tion in the configuration layer of smarter and more biologically inspired mechanisms 
of gene expression. This structure, together with the cellular organization of the tis- 
sue, requires the evolutionary mechanisms that go beyond conventional genetic algo- 
rithms in that they capitalize and leave more space to ontogenetic and epigenetic 
mechanisms of dynamic self-organization of the tissue. 

Finally, the presence of three distinct hardware layers allows the implementation of 
different mechanisms of self-organization in isolation without necessarily integrating 
them all. For example, one may decide to implement a given neuronal structure and 
its learning mechanisms on the circuit without going into evolution and gene expres- 
sion. Similarly, another may wish to implement conventional electronic functions and 
exploit only the self-configuration and self-repair properties of the circuit without 
resorting to evolution and learning. This flexibility of the proposed circuit will indeed 
be exploited in the project to study the different aspects of self-organization and 
gradually integrate them all in the final POEtic tissue. 



5 A Bio-inspired Architecture 

Adaptation and specialization are key features of all biological organisms. As a con- 
sequence, no bio-inspired architecture can be rigid, but must rather be able to adapt 
and specialize depending on the desired application. The POEtic tissue will exploit 
reconfigurable logic to the fullest extent to allow for these characteristics: the bio- 
inspired machines will consist of two-dimensional arrays of cells, where each cell is a 
small, fully reconfigurable, processing element. 

The role of the tissue is to provide a molecular substrate for the implementation of 
the cells: as organic cells are constituted by molecules, so artificial cells will be con- 
stituted by the programmable logic elements of the POEtic circuit. The three layers of 
the POEtic systems (figure 1) will all be implemented on this substrate. 

Integrating reprogrammability in an architecture, however, is far from trivial. The 
heterogeneity of the applications that can potentially be implemented poses consider- 
able design problems, particularly for the control of the processing elements. A novel 
kind of architecture needs to be developed to fully exploit the potential of reconfigur- 
able logic for bio-inspired systems. 




136 Andy M. Tyrrell et al. 



The three-layer architecture proposed in the POEtic tissue, together with its im- 
plementation on the molecular substrate, allows the hardware structure of each layer 
to be adapted to the bio-inspired mechanism that it implements. What follows is an 
outline of a plausible structure for each of the three layers, keeping in mind that the 
reprogrammability of the tissue will allow any or all of these layers to be essentially 
silent, should the bio-inspired mechanism concerned not be required. 



5.1 The Genotype Layer 

The genotype layer (figure 2) contains the genetic information of the organism, which 
consists essentially of: 

• A set of operators, which define the set of all possible functions for the cell. The 
quantity and functionality of the operators depends essentially on the application. 
For example, if the phenotype layer implements an array of neurons, one could 
imagine that the operators will consist of the internal parameters for the neuron, to- 
gether with a set of initial connection weights. Instead, if the phenotype imple- 
ments, say, a sound processing surface, the operators will consist of different filters 
and sound generation functions. 




Fig. 2. The genotype layer in an array of cells and the external controller for evolution. 

• A differentiation table, used to determine which operators will be implemented in 
which cell. The table contains a compact, perhaps redundant, representation of the 
operators to be implemented by the cells. The main advantage of maintaining a 
separate table storing this information is its size: the table will represent an ex- 
tremely compact representation of the structure of the system, and evolutionary 
mechanisms in hardware will be applied to the table, rather than to the operators 
themselves. The differentiation table also plays an important role in the mapping 
layer, as described below. 

The key feature of the genotype layer lies in the presence of a set of dedicated con- 
nections, used to provide user access to the differentiation tables of all the cells in the 
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system. The advantage of this kind of access lies in the presence of an on-chip micro- 
controller, whose function is to access the tables and to manipulate their contents 
through a set of user-defined evolutionary mechanisms. 

5.2 The Configuration or Mapping Layer 

The configuration layer of the POEtic systems is designed to implement the processes 
of cellular differentiation and cellular division (growth), the bases of the ontogenetic 
model (the fault-tolerance aspect of the model is also partially present in this layer, 
but is in general distributed throughout the layers of the array). 

In its simplest conception (figure 3), the function of the configuration layer is to se- 
lect which operator will be implemented by the cell. Each cell must be able to identify 
its position within the array. A set of coordinates, incremented in each cell and trans- 
mitted along the horizontal and vertical axes, is one approach. Once the cell has estab- 
lished its position, it can use this to select from the differentiation table which opera- 
tor to express. The operator will then be interpreted to generate the appropriate 
control signals for the phenotype layer. 

However, this approach is only a very basic solution to achieve cellular differentia- 
tion, and does not exploit many of the most interesting features of the development 
process. 

For example, the solution of figure 3 is limited to a one-to-one mapping between 
genotype and phenotype. From an evolutionary standpoint, however, this approach 
imposes considerable disadvantages in the application of phylogenetic mechanisms. 
More complex mappings, based on redundant coding, are being researched. 

Also, the coordinate-based approach is not well suited to the implementation of 
processes that want to draw inspiration from the natural process of growth, a key 
mechanism in the development of biological organisms. Automatic routing between 
cells will remove the need for a coordinate system, connectivity will be symbolic. 
More complex differentiation mechanisms, based on L-systems [16] and on cell- 
signalling gradients, are currently under study. 



NORTH 

NEIGHBOR PHENOTYPE 




Fig. 3. A possible stmcture for the configuration or mapping layer. 



5.3 The Phenotype Layer 

The phenotype layer is probably the most application-dependent layer of the three. If 
the final application is a "conventional" neural network, the phenotype layer of the 
cell will simply consist of an artificial neuron. The architecture of the neuron is a 
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choice left to the user. The POEtic project will concentrate on spiking neurons but this 
choice does not imply that the tissue will be limited to such a model. 

However, since the POEtic tissue is meant to allow the implementation of bio- 
inspired systems that do not necessarily involve exclusively neural-like cells, a more 
general-purpose phenotype architecture will be required. 

A simplified layout of a candidate architecture of the type that is being investigated 
in the POEtic project is shown in figure 4. It consists of two main units: 

• An execution unit, consisting of a set of application-dependent resources (e.g., 
adders, and counters). The resources are defined by the user at design time, and are 
accessed through a set of input and output ports. 

• In view of the massive amounts of connectivity of bio-inspired systems, the com- 
munication unit will probably have to rely on serial communication. The commu- 
nication unit is seen by the execution unit as just another resource, with an input 
and output port. 

This kind of architecture has two main advantages: it can support specialisation 
and adaptation through the modification of the resources depending on the application 
and it can be integrated in a design environment easily. 




5.4 A Design Methodology 

The definition of three separate layers for the different models of bio-inspiration has 
the advantage of being able to select which of the models are to be used for a given 
problem. This feature is made possible by the complete programmability of the sys- 
tems: all the three layers are implemented on the same substrate, a programmable 
digital logic circuit that incorporates dedicated features for all three models. The im- 
plementation of the layers in programmable logic adds the possibility of adapting the 
architecture of the system to the application, a flexibility that will allow the definition 
of an integrated design environment. Such an environment should include a custom- 
izable design flow allowing the user to concentrate on the desired approaches. A typi- 
cal application-driven design flow for a bio-inspired system could then be: 
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• Based on the application, the user defines which models will be implemented in the 
system. The user then accesses a library of operators and selects those that will be 
required by the application. The phylogenetic model can be applied at this stage 
both to the selection of the functions and to the evolution of novel functions to be 
added to the library. The presence of the epigenetic model is defined at this stage 
by selecting functions capable of implementing learning processes. 

• The environment will then integrate the desired functions into a totipotent cell, that 
is, a processing element capable of realizing any of the functions. This step repre- 
sents the implementation of the ontogenetic model. Other optional ontogenetic fea- 
tures, such as growth and self-repair, can be added automatically at this stage. 

• Finally, the design environment assembles the cells into a two-dimensional array, 
integrates the user-specified I/O modules, and generates a configuration for the 
programmable logic array that will implement the system. 



6 Conclusions 

Systems inspired by biological mechanisms are carving themselves a niche in several 
areas of computation, but the solutions adopted often cannot be generalised for the 
lack of a universal architecture capable of integrating the different approaches. 

The POEtic project does not necessarily propose such an architecture: what it pro- 
poses is an hardware substrate capable of implementing bio-inspired systems. Its aim 
is to develop the molecules necessary for the development of cellular systems. The 
usefulness and the practicality of the molecular surface will obviously be validated on 
a number of bio-inspired applications, ranging from neural networks to music synthe- 
sis. But beyond the applications developed within the project, the POEtic tissue will 
allow the implementation of a variety of bio-inspired systems, and it is our hope that 
the circuits created in this project will find users well beyond our group. This paper 
by no means presents the final version of the POEtic system: the project has started 
only recently, and the ideas presented here are simply early results of our research. 
We invite all readers to access our website [2], where we regularly post the progress 
reports and deliverables of the project. 
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Abstract. In this article, we introduce the approach to the realization 
of ontogenetic development and fault tolerance that will be implemented 
in the POEtic tissue, a novel reconhgurable digital circuit dedicated to 
the realization of bio-inspired systems. 

The modelization in electronic hardware of the developmental process 
of multi-cellular biological organisms is an approach that could become 
extremely useful in the implementation of highly complex systems, where 
concepts such as self-organization and fault tolerance are key issues. 
The concepts presented in this article represent an attempt at finding a 
useful set of mechanisms to allow the implementation in digital hardware 
of a bio-inspired developmental process with a reasonable overhead. 



1 Introduction 

The POEtic tissue is a self-contained digital integrated circuit aimed at the 
implementation of bio-inspired systems being developed in the framework of 
a new three-year research project, called “Reconhgurable POEtic Tissue” or 
“POEtic” for short (see the project’s website at http://www.poetictissue.org 
for more details), recently started under the aegis of the Information Society 
Technologies (1ST) program of the European Community, involving the Swiss 
Federal Institute of Technology in Lausanne (EPFL, Switzerland), the University 
of York (England), the Technical University of Catalunya (UPC, Spain), the 
University of Glasgow (Scotland), and the University of Lausanne (Switzerland). 

POEtic tissues are designed to implement a vast range of bio-inspired sys- 
tems, covering all the major axes of research in the domain (Phylogenesis, On- 
togenesis, and Epigenesis) [i7iibii7Hor>n] In this article, we will present the 
approaches we have selected to realize one of the three biological models that 
inspire circuit design: ontogenesis, that is, the development and growth of multi- 
cellular organisms, along with the fault tolerance that such structures imply. 

In the next section, we will introduce the main features and the global archi- 
tecture of POEtic systems as developed in our project, with particular emphasis 
on those aspects that are directly involved in the ontogenetic axis. We will then 
examine in some detail the two main areas of research currently under study for 
this part of the project, namely development and fault tolerance, and present 
our solutions for their implementation in silicon. 
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Fig. 1. The organizational layers of an electronic organism. 



2 POEtic Hardware 

As the general features of POEtic systems have been detailed elsewhere pi DI21)J . in 
this section we will concentrate on those aspects of the architecture most relevant 
to the topics of the article, i.e., ontogenetic development and fault tolerance. 

POEtic systems draw inspiration from the multi-cellular structure of complex 
biological organisms. As in biology, at the heart of our systems is a hierarchical 
structure (Fig. Q ranging from the molecular to the population levels. In par- 
ticular, the subjects of this article relate mainly to the cellular and organismic 
levels, with a brief foray into the molecular level for fault tolerance. 

It is important to note, in this context, that the only hardware (and hence 
fixed) level of the POEtic systems is the molecular level, implemented as a novel 
digital FPGA. All other levels are configurations for this FPGA, and thus can 
be seen as configware, that is, entities that can be structurally programmed by 
the user. As a consequence, most of the mechanisms described in this article, 
and notably ontogenetic development and cellular fault tolerance, can be altered 
(or removed) to fit a particular application or a novel research approach. 

The current (being configware, it could be modified in the future as new 
solutions are researched) architecture of the POEtic cells is described in Fig. El 
It consists of three logical layers (physically, of course, the three layers are “flat- 
tened” onto the molecular FPGA), each with a specific function within the do- 
main of bio-inspired systems. In this article, we shall concentrate on the mapping 
layer, as, together with the differentiation table, it is the layer directly concerned 
by the ontogenetic development of our machines (fault tolerance, as we shall see, 
is involved in the system at all levels). 

To illustrate with a practical example the structure of POEtic circuits, we 
shall use a very simple application: a timer (Fig. El) capable of counting minutes 
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Fig. 2. The three logical layers of an electronic cell, mapped on the molecular tissue. 
The use of a programmable circuit allows the layers to be adapted to each application. 
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Ml = Tens of minutes 
MO = Units of minutes 
SI = Tens of seconds 
SO = Units of seconds 
CK = Operating clock 



Fig. 3. Cellular implementation of a simple four-digit timer. 



and seconds. This architecture has the twin advantage of being conceptually very 
simple and of being easily broken down into a cellular structure. 

The first step in the design of a POEtic application is in fact to identify 
a logical division into cells. For this example, the division is obvious: tens of 
minutes, units of minutes, tens of seconds, units of seconds, which incidentally 
also correspond to the four outputs of the system (while the only input is the IHz 
synchronization clock). In general, the difficulty of this step depends of course 
on the application, and is probably the main criterion to determine whether an 
application is amenable to being implemented in a POEtic circuit. 

Next, it is necessary to identify, on the basis of the cellular structure of the 
application, the operators or functions required by the cells of the system. For 
the timer, a minimum of two operators is required: a counter by six (for the 
tens of seconds and minutes) and a counter by ten (for the units of seconds and 
minutes). In addition, the communication pattern of the cells must also be fixed: 
for the timer of Fig. 0 the output of the counters is sent to the north, while the 
synchronization signals are sent out to the west. 

At this stage, the architecture of the cells can be defined. This step implies 
essentially a design choice to decide where to place the complexity of the system. 
In fact, in a typical example of hardware/software co-design, the complexity 
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can reside in the genome and in its interpreter, Embryonics-style El, or in the 
phenotype. For example, in the case of the timer, the first approach would require 
a genome implemented by a structured program, used to control a simple 4-bit 
register in the phenotype. Or, in the second approach, the genome could well be 
reduced to a single number, used as a parameter for a programmable counter in 
the phenotype layer. Probably a more efficient solution in this case, this second 
approach would practically eliminate the requirement for an interpreter of the 
genome, and shift all the (in this case limited) complexity to the phenotype 
layer. 

Of course, the set of operators represents only part of the genetic information 
of the organism, the rest being made up by the differentiation table used to assign 
operators to the cells in the system. The definition of a mechanism to implement 
differentiation in a POEtic system is the subject of the next section. 

3 Development in the POEtic Tissue 

The modelization in electronic hardware of the developmental process of multi- 
cellular biological organisms is a challenging task, for reasons that should be 
obvious: on one hand, the sheer complexity of even the simplest multi-cellular 
biological organisms is orders of magnitude beyond the current capability of 
electronic circuits, and on the other hand the development of such organisms 
implies, if not the creation, at least the transformation of matter in ways that 
cannot yet be realized in digital or analog hardware. 

And yet these same considerations make such a process extremely interesting 
for hardware design: a developmental approach could well provide a solution, 
via the concept of self-organization, to the design of electronic circuits of a 
complexity beyond current layout techniques, and introduce into electronics at 
least some of the astounding fault tolerance of multi-cellular organisms, while an 
approximation of the growth process of biological entities could be an invaluable 
tool to introduce adaptability and versatility to the world of electronics. 

Setting aside, for the moment, the concept of fault tolerance, in this section 
we will describe two different approaches to the twin mechanisms of cellular 
division and cellular differentiation, mechanisms at the basis of biological devel- 
opment. 

This section will introduce two solutions to the implementation of the second 
mechanism: cellular differentiation, that is, the mechanism that allows a cell to 
determine its function within the organism (in other words, the architecture of 
the mapping layer of our cell). As far as cellular division is concerned, we are 
currently working on the development of some algorithms allowing our circuit to 
implement a rough approximation of this mechanism (see the “Implementation 
Issues” subsection for more details). 
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3.1 Coordinate-Based Development 

Probably the simplest approach to cellular differentiation in an architecture such 
as that of the POEtic circuits is to assign to each cell in the system a unique 
coordinate (in fact, for two-dimensional structures, a set of [X,Y] coordinates). 
The cell can then determine its function by selecting an operator depending on 
its (unique) coordinates. This is the approach used in the Embryonics project, 
and has been described in much detail elsewhere [ II I II MJ . 

Applied to the timer example (Fig. this approach would require three 
different kinds of functional cell (count by 6, count by 10, and pass-through), 
and the timer can be realized by four cells (out of the 15 that could fit in the 
circuit in this example). The computation of the coordinates is quite simple, 
implemented by incrementing the coordinate in each cell. It is then trivial to 
define a differentiation table to assign an operator to each set of coordinates. 



I/O PORTS 

Ml = Tens of minutes 
MO = Units of minutes 
SI = Tens of seconds 
SO = Units of seconds 
CK = Operating clock 
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Fig. 4. A possible layout for the timer in a coordinate-based development system. In 
this system, the position of each cell in the array is determined at design time. 
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It might be worthwhile to mention that in the Embryonics approach the 
complexity of the system resides mostly in the genome, implemented as an exe- 
cutable program. As a result of this design choice, the interpreter for the genome 
is the most complex part of the cell, while the phenotype layer is limited to a sin- 
gle register plus some rudimentary connection network. The POEtic approach, 
insofar as a coordinate-based development is concerned, will investigate a wide 
range of tradeoffs between the complexity of the genome and the complexity of 
the phenotype, as described above for the timer example. 

3.2 Gradient-Based Development 

The coordinate-based approach described above has a number of advantages, and 
notably its simplicity on the one hand, and the fact that it has been extensively 
used and tested in the Embryonics project over many years on the other. In a 
way, as long as the systems to be implemented are completely pre-determined 
(i.e., as long as the entire development of the organism is fully specified in the 
genome, as is the case, for example, of the caenorhabditis elegans [IE]), it is also 
the most efficient and complete, and in fact any other approach can be reduced 
to a coordinate-based approach during development. 
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However, the growth and development of complex multi-cellular organisms 
in nature is influenced by the environment from the earliest stages. While a 
problem from the point of view of an electronic implementation, this feature 
can lead to extremely interesting systems capable of adapting to environments 
unknown at design time. 

In order to illustrate how such adaptation could be applied to digital cir- 
cuits, we shall exploit, for clarity’s sake, the same example used above for the 
coordinate-based approach, that is, the timer. It should be noted, however, that 
such a simplistic example is far from the ideal candidate for an approach designed 
to address the issue of adaptation: a much more interesting held of application 
are neural networks, systems that are inherently capable of exploiting rich in- 
teractions with the environment. 

To come back to the simple example of the 4-digit timer, let us then assume 
that the circuit has to operate in an a priori unknown environment. The envi- 
ronment of a timer is, of course, extremely limited, and could be seen, in the 
most complex case, as consisting of one input port for the operating frequency 
of the timer, and of four output ports for the four digits. An unknown envi- 
ronment could then be an environment where the exact position of these input 
and output ports is not known at design time (once again, this is an extremely 
unlikely scenario in the design of a timer, but makes much more sense from the 
point of view of, for example, the control of a robot). A coordinate-based system 
is not capable of handling such a changing environment, since coordinates are 
defined as a function of the position of the cells within the circuit, independently 
of the external conditions. A more versatile approach is then required, capable 
of assigning a function to a cell depending not only on the internal configuration 
of the circuit, but also of the environment in which the circuit operates. Again 
drawing inspiration from biology, our choice has fallen on an approach based on 
the protein gradients that direct the development of organisms in nature. 

The concept of protein gradients has been examined in detail in many pub- 
lications, for biological Hin as well as electronic IMI organisms. Essentially, 
a gradient-based system consists of a set of diffusers that release a given protein 
into the system. The concentration of the protein in the system is highest at 
the diffuser, and decreases with the distance. Cells are assigned a functionality 
depending on the proteins’ concentration, and hence on the distance from the 
proteins’ diffusers, implementing a cellular differentiation based on their position 
relative to the diffusers rather than on their coordinates within the system. 

For the timer example, we can consider that each of the five I/O ports men- 
tioned above is a diffuser for a different kind of protein (again, we chose this 
solution for the sake of clarity, as in reality there is a tradeoff between the num- 
ber of different proteins and the complexity of the mapping circuit). It is then 
relatively simple to design a cell that selects its function depending on the con- 
centration of the appropriate protein. For example (Fig. EJ, a simple algorithm 
could place the timer so that the rightmost cell is contiguous to the input port for 
the operating frequency and use the other cells in the circuit to create paths from 
the four cells to the four output ports, following the corresponding gradients. 
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Fig. 5. Two possible layouts for the timer in a gradient-based development system. The 
position of each cell in the array is determined by the system during conhguration. 



Even in such a simplistic example, the versatility of a gradient-based system 
should be obvious. This kind of approach can also exploit the other axes of bio- 
inspiration to increase the adaptability of multi-cellular systems (aside from the 
obvious interest for adaptive systems such as neural networks, evolution can be 
used, for example, to define the position within the system of some emitters, 
allowing the genome to partially direct the development of the organism) . 



3.3 Implementation Issues 

We are actively pursuing research on both of the developmental approaches 
described above. As we mentioned, for systems that can be completely defined in 
the genome at design time, a coordinate-based system would be more efficient. 
On the other hand, for adaptive systems where the environments influences 
development, a gradient-based approach provides a much more versatile tool. 

In parallel, we are also investigating more complex, richer mappings between 
the genotype and the phenotype. The Embryonics approach, for example, re- 
quires a rigid one-to-one mapping from coordinates to function. A much more 
interesting approach exploits the possibility of using partially-defined differen- 
tiation tables that are accessed not through a one-to-one mapping, but rather 
through more complex algorithms (e.g., by computing the Hamming distance 
between the coordinates or the protein concentrations and the table entries). 

The use of partially-defined differentiation tables, coupled with the possibility 
of environment-directed development (e.g., through protein diffusers), also opens 
the possibility of mechanisms that approximate the growth process in biological 
organisms. If the organism’s structure is only partially defined in the genome, 
and if the environment is used to define the cells’ function, then an informational 
(rather than physical, as in nature) growth process can be realized. 
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4 Fault Tolerance in the POEtic Tissue 

Even if the complexity of digital hardware remains orders of magnitude behind 
that of the more developed biological organisms, some of the issues that nature 
had to confront during the millions of years required for the evolution of such 
organisms are gaining interest for electronics. For example, as we mentioned 
earlier, silicon might well be replaced, in a not-so-distant future, by molecular-size 
components with densities beyond what could be handled through current layout 
techniques. Among the many challenges represented by such novel hardware, 
one in particular stands out: these incredibly small components will never be 
“perfect”. In other words, circuits will inevitably contain faulty elements. 

Nature, faced with this problem, has developed extremely efficient solutions. 
Fault tolerance is definitely one domain where nature is vastly more advanced 
that electronics, and one of our goals is to draw inspiration from biological heal- 
ing and repair mechanisms to achieve stronger fault tolerance in digital hardware. 

From a “practical” standpoint, it is important to note that fault tolerance 
implies in fact two different processes: self-test, to detect that a fault has occurred 
within the circuit, and self-repair, to somehow allow the circuit to keep operating 
in the presence of one or more faults. The two processes are in fact very distinct, 
both conceptually and in practice. In particular, self-test implies that the circuit 
be capable of detecting incorrect operation and, since the fault will eventually 
have to be neutralized, the exact site where the incorrect operation occurs. Self- 
repair, on the other hand, implies that all parts of the circuit must be replaceable, 
physically or at least logically, via some sort of reconfiguration. 

There exist, for both processes, many “conventional” approaches 
but no really efficient off-the-shelf solution. Drawing once again inspiration from 
nature, where “fault tolerance” is achieved through not one, but a set of mecha- 
nisms ranging from molecular repair to complex organism-level immune systems, 
we are developing a hierarchical approach to fault tolerance, spanning all the lev- 
els (Fig. P of our POEtic systems. 

4.1 Molecular- Level Fault Tolerance 

Molecular-level fault tolerance is probably the most sensitive area of our hierar- 
chical approach, as the molecules (and hence their test and repair mechanisms) 
represent the hardware of our system, and thus will necessarily have to be fixed 
at design time (rather than being modifiable by configware as the other layers). 

However, the hardware layer of the POEtic tissue has not yet been fully 
designed (it represents the main research effort under way at the moment within 
the project) and, by their nature, logic level test and repair mechanisms require 
a completed layout to be efficiently implemented. 

In general, the fault tolerance mechanisms implemented for the POEtic tissue 
will bear a loose resemblance to those used for Embryonics designs [ ll I II 8J . at 
least in that they will rely on a hybrid approach mixing mechanisms such as 
duplication (a mechanism found in nature in the DNA’s double helix and in its 
error-correcting processes) and memory testing m- 
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Fig. 6. Self-repair through reconfiguration in a coordinate-based system. As each cell is 
totipotent, reconfiguration consists simply of a re-computation of the cells’ coordinates. 



4.2 Cellular-Level Fault Tolerance 

The coarser grain of cells with respect to molecules can be exploited to introduce 
more complex test and repair patterns. Fault tolerance at the processor level 
(cells can be viewed, for testing purposes, as small processing units) is usually 
implemented through techniques radically different from the logic level, and more 
complex reconfiguration patterns can be realized by exploiting the development 
approaches implemented in the POEtic circuits. 

For example, testing at the processor level can take advantage of the presence 
of multi-bit busses to test the correct operation of the machine through tech- 
niques such as parity-checking. As far as self-repair is concerned, reconfiguration 
mechanisms can exploit the features of the POEtic architecture to simplify the 
rearrangement of tasks within the array: the presence of the full genome in each 
cell, coupled with the developmental mechanism that assigns the function to the 
cells depending on local conditions, can for example allow a relatively simple 
implementation of on-line self-repair via reconfiguration (Fig.|^. 

Of course, it should also be noted that many of the systems that POEtic 
architectures are meant for, and notably neural networks, are inherently fault 
tolerant at the cellular level: the death of a neuron within the system will force 
the learning algorithms to automatically avoid the faulty neuron and to find a 
solution that can perform the desired computational task in a reduced network. 

4.3 Organismic-Level Fault Tolerance 

A simplistic vision of organismic-level fault tolerance consists of exploiting the 
developmental approach of POEtic systems to create multiple redundant copies 
of an organism during configuration. This approach, used in the Embryonics 
systems m introduces improved fault tolerance through a simple comparison 
of the outputs of multiple identical circuits (the biological analogy would reside 
in the survival of a population even if individuals die). 

This form of fault-tolerance, while relatively simple to realize (through cycles 
in the coordinates, as in Fig. 0 or by using multiple diffusers of the same kind 
in a gradient-based system), requires material outside of the POEtic circuits to 
identify (and possibly kill) faulty organisms. 

A more interesting approach (and one of the research axes of the POEtic 
project) tries to exploit mechanisms inspired by biological immune systems 0E]. 



150 



Gianluca Tempesti et al. 




Ml MO SI so 







]a 


L0[ 






jclof 








jc] 


LO) 






■|cio[ 


n 






]ci 








jclof 





I/O PORTS 

Ml = Tens of minutes 
MO = Units of minutes 
SI = Tens of seconds 
SO = Units of seconds 
CK = Operating clock 



X 

Y 



Coordinate 

Coordinaten 



i{9j9 



naten 



Fig. 7. Multiple redundant copies of an organism through coordinate cycling. 



In higher organisms, these systems rely on a multi-layered, distributed approach 
that is robust and capable of identifying numerous pathogens and other causes 
of illness, and is undoubtedly an interesting source of inspiration for the devel- 
opment of reliability mechanisms in silicon. 

The approach relies on a negative selection algorithm 0 coupled with a 
more “conventional” mechanism that could be based on roving self-test areas 
(STAR) p. In practice, the approach merges a fixed testing mechanism (the 
innate part of the immune system) with a learning negative selection mechanism 
(the aequired part), working at a system or subsystem level to monitor the state 
of the circuit and identify and kill cells that develop faults at runtime. 



4.4 Implementation Issues 

The final goal of the mechanisms we introduced is to illustrate how a hierarchical 
approach to fault tolerance, along the same general lines as the one present in 
complex biological organisms, is a very efficient solution from the point of view of 
assuring the correct operation of the circuit in the presence of faults. By adopting 
a hierarchy of mechanisms, we can exploit the best features of each level of our 
systems, and thus limit the overhead required to obtain an “acceptable” level of 
reliability for the system as a whole. 

In reality, the only relevant overhead generated by introducing fault tolerance 
in our POEtic circuits resides in the molecular level: since all other levels are 
implemented in configware, reliability (and its associated overhead) becomes an 
option, and can be removed should fault tolerance not be a priority. 

The important implementation issues for fault tolerance thus concern essen- 
tially the test and repair mechanisms at the molecular level. As we mentioned, 
however, the design of the molecule is currently under way, and all the important 
parameters required to introduce reliability to the molecular FPGA (connection 
network, degree of homogeneity, configuration mechanism, etc.) have yet to be 
fixed. A complete description of the implementation of fault tolerance in the 
POEtic tissue will thus be the subject of a future article. 



5 Conclusions 

The concepts presented in this article represent an attempt at finding a useful 
set of mechanisms to allow the implementation in digital hardware of a bio- 
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inspired developmental process with a reasonable overhead. The modelization 
of multi-cellular organisms in silicon is an approach that could soon become 
extremely useful for the realization of highly complex systems, where concepts 
such as self-organization and fault tolerance are becoming key issues. 

Much of our effort is currently focused on finding an adequate approach to 
implement development in the POEtic tissue. We are examining processes that 
are at the same time efficient from the point of view of the required hardware 
resources and scalable across one or even multiple chips. Notably, we hope to 
develop a mechanism versatile enough to allow on-line addition of new chips (that 
is, the user will be able to add new hardware to the system without disrupting its 
operation) , thus achieving something closer to the physical growth of biological 
organisms. 

The two developmental models described in this article fit remarkably well 
our requirements (scalability, robustness, efficiency, etc.), and complement each 
other remarkably well: the coordinate-based approach is most efficient for “con- 
ventional” systems where the layout of the circuit is known in advance, while 
the gradient-based approach provides an increased versatility invaluable for the 
realization of adaptive systems such as neural networks. Moreover, the architec- 
ture of our POEtic machines places the developmental mechanism where it can 
be modified by configware, allowing in the future the implementation of different 
approaches. 

It is definitely too soon to draw any conclusions on the topic of fault tolerance 
in the POEtic tissue: until the hardware layer is fixed, the application of self-test 
and self-repair techniques is impossible. One aspect that is already apparent, 
however, is that our research is focusing on a hierarchical set of mechanisms 
meant to work together to achieve a level of reliability that would not be possible 
for a single mechanism. 

Finally, we wish to reiterate the reconfigurable aspects of the architecture we 
described: the molecular tissue is meant to be a universal tissue for the imple- 
mentation of bio-inspired systems. The architectures and mechanisms described 
in this article represent only some of the ideas that will be pursued in the POEtic 
project. The circuit, once developed, will be made publicly available and should 
prove a useful tool for research in the domain beyond the participants to the 
project. 
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Abstract. This paper describes a new evolutionary mechanism devel- 
oped specifically for cellular circuits. Called morphogenetic system, it 
is inspired by the mechanisms of gene expression and cell differentia- 
tion found in living organisms. It will be used as the phylogenetic (evo- 
lutionary) mechanism in the POEtic project. The POEtic project will 
deliver an electronic circuit, called the POEtic circuit, with the capa- 
bility to evolve (Phylogenesis), self-repair and grow (Ontogenesis) and 
learn (Epigenesis). The morphogenetic system is applied to the gener- 
ation of patterns and to the evolution of spiking neural networks, with 
experiments of pattern recognition and obstacle avoidance with robots. 
Experimental results show that the morphogenetic system outperforms 
a direct genetic coding in several experiments. 



1 Introduction 

The POEtic circuit is a multi-cellular electronic circuit composed of a regular 
2D array of cells (i.e. functional units), which has capabilities to evolve (Phylo- 
genesis), self-repair and grow (Ontogenesis) and learn (Epigenesis). The POEtic 
circuit is reprogrammable and its functionality comes from its configuration bits 
or genotype, which will be evolved using the phylogenetic mechanism described 
herein. To implement self-repair and growth, each cell may contain the genome 
of the whole circuit. In addition, each cell has virtual input/output connections 
to sensors and actuators. The POEtic circuit will be used in a wide range of ap- 
plications, including autonomous mobile robotics. A more detailed description 
of the project is given in m- 

This paper is the first accounting of the morphogenetic evolutionary system 
which will be implemented in the POEtic circuit. This system is designed to be 
computationally simple (hardware implementation) and suited to multi-cellular 
circuits like POEtic. The morphogenetic system relies on a simple signalling 
mechanism and expression rules to encode the functionality of a cellular system 
which can be a neural network or any other cellular electronic circuit (e.g. the 
Bio Wall PJ). Its development is motivated by the dynamical reconfiguration 
needs of the POEtic circuit, which cannot be easily handled by direct genetic 
codings with one-to-one mappings between the genotype and the phenotype m- 
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In this paper we describe the morphogenetic system and compare it with a 
direct genetic encoding in experiments of pattern coverage and in the evolution 
of spiking neural networks for pattern recognition and obstacle avoidance using 
real miniature robots. Results show that the morphogenetic system outperforms 
direct genetic encoding by finding individuals of better fitness in less generations. 
Analysis suggests that the morphogenetic system generates more individuals of 
higher fitness than the direct genetic encoding, thus easing the evolutionary 
process. 

2 Morphogenetic Coding 

Introduction. The morphogenetic system is motivated by the requirement of 
the POEtic circuit to have a phylogenetic mechanism which is suited to its 
cellular structure and its capacity to dynamically reconfigure, e.g. when errors 
are detected, when the environment changes or when the circuit is expanded with 
new cells. Direct genetic encodings are not suited for that kind of architecture 
because the number of elements in the system must be known in advance and 
cannot change throughout the life of the system. 

The morphogenetic system assigns a functionality to each cell of the circuit 
from a set of predefined functionalities (something akin to skin, muscle, neuron 
cells, etc. in living organisms). The process works in two phases: first a signalling 
phase then an expression phase. The signalling phase relies on the ability of the 
cellular circuit to exchange signals among adjacent cells to implement a diffusion 
process. The second phase, expression, finds the functionality to be expressed in 
each cell by matching the signal intensities in each cell with a corresponding func- 
tionality stored in an expression table. The genetic code contains the position 
of diffusing cells (diffusers) and the content of the expression table, which are 
both evolved using a genetic algorithm. The morphogenetic system is inspired by 
the mechanisms of gene expression and cell differentiation of living organism |0, 
notably by the fact that concentrations of proteins and inter-cellular chemical 
signalling regulate the functionality of cells. Other works related to this mor- 
phogenetic systems include L-Systems [0, use of developmental processes m. 
and also approximate representations [P seeking to compress the genotype. 



Description. The morphogenetic system relies on a set of predefined function- 
alities which is called a family of functions. The family of functions must offer a 
rich enough repertoire of functionalities to realize the desired circuit. The family 
which is used when evolving neural networks on the POEtic circuit is composed 
of spiking neurons with different connectivity patterns, and either excitatory or 
inhibitory characteristics (fig. p. Spiking neurons are used because they display 
rich non-linear dynamics and because they are well suited for implementation in 
digital circuits 0. 

The cellular circuit allows for signals to be exchanged between adjacent cells. 
A signal is a simple numerical value (the signal intensity) that the cell owns, 
and that adjacent cells are able to read (the intensity of signal s in cell i is 
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Fig. 1. In the current implementation, a family composed of 12 functions is used when 
evolving neural networks. The functions are spiking neurons with different connectiv- 
ities (the 6 types of connectivity shown in the figure) and with either excitatory or 
inhibitory characteristics. Each cell of the multi-cellular circuit implements a single 
neuron, shown in gray. It receives inputs from neighboring neurons (outlined), which 
are implemented in neighbouring cells. Each neuron has an extra external input (curved 
arrow). Neuron F 5 is equivalent to a void cell. At the boundary of the cellular array 
the connectivity is truncated (no periodic boundary condition). 




The cellular array with two 
diffusers (t*0). 



Signaling phase (t»2) 



The signaling phase is 
complete (t=15) 



Expression phase 



Fig. 2. The three arrays on the left are snapshots of the signalling phase with one type 
of signal and two diffusers (gray cells) at the start of the signalling phase (left array), 
after two time steps (middle) and when the signalling is complete (right). The number 
inside the cells indicates the intensity of the signal in hexadecimal. The expression 
table used in the expression phase is shown on the right. In this example the signal 
D matches the second entry of the table with signal F (smalleing distance), thus 
expressing function Fi. 



noted by Cl). Special cells, called diffusers, own a signal of maximum intensity. 
The signal intensity in the neighbouring cells then decreases linearly with the 
Manhattan distance to the diffuser. In the current implementation four type 
of signals (each represented by a 4-bit number) are used. They are diffused 
independently, without interaction among them. Figure Qshows an example of 
the signalling phase in the case of a single type of signal, with two diffusers 
placed in the cellular circuit. 
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Fig. 3. The genetic code contains two parts. The first is the expression table T, here 
with n entries. Each entry is 16 bits long. The second part contains the location of the 
diffusers and their type. The number of bits for the X and Y coordinates depends on 
the size of the network. The type of the diffuser is encoded on 2 bits. 



The expression phase assigns a function to each cell by matching the signal 
intensities inside that cell with the entries of an expression table T (fig. 0 stored 
in the genetic code. Each entry of the table contains the intensities of the four 
signals and the function to express in case of match. The intensity of signal s 
in the entry j of the table is noted by T^. A cell i is said to match an entry j 
of the expression table when the distance d = X^s=i ^Op{C\,Ti) is minimum. 
The distance operator DOp is the Hamming distance. 

The genetic code contains two parts. The first part is the expression table 
T and the second is the location of the diffusers (fig. 0. The expression table 
contains 16 bits for each entry (4 signals coded on 4 bits each). Each entry 
corresponds to a predefined function. The functions are not encoded and evolved 
in these experiments. The location of the diffuser is stored as a pair of X,Y 
coordinates, plus two bits indicating the type of the diffuser (i.e. 2^ = 4 types 
of diffuser). 

A genetic algorithm is used to evolve the morphogenetic coding. In all of 
the experiments presented here the population is composed of 50 individuals, 
selection consists of rank selection of the 15 best individuals, the mutation rate 
is 1%, one-point crossover rate is 20% and elitism is used by copying the 5 best 
individuals without modifications in the new generation. 

3 Pattern Coverage Experiment 

This experiment is aimed at testing whether the morphogenetic system can gen- 
erate circuits with diversified structures (without any specific functionality). It 
consists of covering an array of 8x8 cells with a specific binary pattern. Four 
different patterns are tested (fig. 0. A family of two cell functions (black or 
white cell) is used. The morphogenetic system uses two entries in the expression 
table and 16 diffusers. Preliminary tests with numbers of diffusers in the range 
of 2 to 64 have shown that a too low or a too high number of diffusers hinder 
the performance of the morphogenetic system with some patterns. Values in the 
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Mixed 1 




Mixed 2 




Checkerboard 



Uniform 



Maximum fitness with the mixed 1 pattern (avg of 50 runs) 





Fig. 4. The pattern coverage experiment consists in covering an array of 8x8 cells with 
a binary pattern. The left column shows the four desired pattern [mixedl, mixed2, 
checkerboard and uniform). The right column shows the evolution of the maximum 
fitness for the mixedl and mixed2 patterns. 



range of 5 to 20 showed quite similar performance. As a good compromise, 16 
diffusers have been chosen. The size of the morphogenetic coding is 160 bits: 2 
entries in the expression table * 16 bits + 16 diffusers * 8 bits (6 bits to store the 
coordinates and 2 bits to store the type of the diffuser). Also, a direct genetic 
encoding is used for comparison, which consists of a binary string with 1 bit per 
cell (indicating whether the cell is black or white), resulting in a code of 64 bits. 

The figure 21 shows the evolution of the maximum normalized fitness, which 
is proportional to the number of cells covered correctly, for the mixedl and 
mixed2 patterns, averaged over 50 runs. The evolution of the fitness for the 
uniform and checkerboard patterns is very similar to the mixedl pattern. In 
the first few generations, the maximum fitness increases notably faster with 
the morphogenetic system than with the direct coding with all four patterns. 
With the uniform, checkerboard and mixedl pattern the morphogenetic system 
covers the patterns almost three times faster than the direct coding in terms 
of generation number (about 10 generations instead of 30). The morphogenetic 
system cannot fully cover the mixed2 pattern, but it comes very close. This 
seems to indicate that some type of regularity is necessary for the morphogenetic 
system to perform well, or that the system has some biases toward some types 
of pattern. Further investigation are necessary to clarify this point. However, in 
the context of the POEtic circuit this may not be an issue, as the epigenetic 
(learning) mechanism may deal with structures at a finer granularity level, while 
the phylogenetic mechanism deals with structures at a coarser level. Further 
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experiments may also be performed to assess how the morphogenetic system 
scales with larger arrays. 

4 Spiking Neuron Model 

The spiking neuron model used in the following experiments is a discrete-time, 
integrate-and-fire model with leakage and a refractory period. Each neuron has 
weighted inputs (-1-2 or -2 depending on whether the presynaptic neuron is exci- 
tatory or inhibitory) from connected neurons. Each cell can express one neuron 
type, from a family of 12 (6 connectivity patterns times 2 sign types) shown in 
fig Q It has one more connection from an external input, e.g. to connect from a 
sensor, with fixed weight of -1-10. The neuron integrates the incoming spikes in 
the membrane potential, according to the weights of the connections. Once the 
membrane potential reaches a threshold (fixed to 4), the neuron fires (emits a 
spike), resets its membrane potential to 0 and enters a refractory period where 
it does not integrate incoming spikes for one time step. After the integration 
phase and if the neuron has not fired, leakage is applied by decrementing mem- 
brane potential by 1 (or incrementing it if the potential is below 0), so that the 
potential tends to 0. 

5 Pattern Recognition Experiment 

A circuit of 8x8 cells (fig. 0 was evolved to recognize characters (any other pat- 
tern could be used) using two training sets: one set contains corrupted versions 
of two characters, the other set contains random patterns. The circuit must in- 
dicate whether the current pattern is one of both characters or not. The fitness 
of the network is evaluated by presenting successively all the patterns of a train- 
ing set, and it is equal to the number of times it correctly classifies the input 
pattern. Fig. 0shows the training set for the recognition of characters A and 
C (noted as A-l-C). The upper line shows the subset of patterns to recognize. 
The second line contains random patterns that must be rejected. The maximum 
fitness achievable is 20 (20 patterns in the training set). The experiments have 
been performed with four different training sets, for the recognition of characters 
A-l-B, A-l-C, A-l-D and A-l-E. 

Figure Qshows the array of cells in the circuit. The input pattern is applied 
on a subset of cells through the external input of the cells. Each cell receives 
one pixel of the pattern: if the pixel is black, then it receives a spike every two 
time steps, otherwise it receives no spike. The network is run for 100 time steps 
with the input applied to it, after which the activity of the output neuron on the 
right of the network is read. The activity level (number of spikes) of that neuron 
indicates whether the pattern is recognized or not (threshold=50% of maximum 
spike number). 

A morphogenetic system with 16 diffusers and 12 entries in the expression 
table (one for each type of neuron) is compared to the direct coding. The size 
of the morphogenetic coding is 320 bits: 12 entries in the expression table * 16 
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Fig. 5. Left: training set for the recognition of the patterns A and C. It is composed 
of 20 patterns in two subsets of 10 patterns. The upper line contains the patterns to 
recognize which are the letters A and C. The second line contains the random patterns 
that the network must reject. Right: layout of the cells in the circuit. The input pattern 
is applied on an array of 7x8 cells on the left, with each cell receiving the input from 
one pixel of the pattern. The output of a single cell at the right column is used to 
indicate whether the character has been recognized or not. 

Table 1. Number of runs, out of 100 performed for each training set, reaching the 
maximum fitness. 





8x8 network 


Training set 


Direct 


Morph. 


A-bB 


18 


27 


A-bC 


8 


34 


A-bD 


19 


20 


A-bE 


14 


54 


Total (max is 400): 


59 


135 



bits + 16 diffusers * 8 bits (6 bits for the coordinates and 2 bits for the type of 
the diffuser). The size of the direct coding is 256 bits (12 type of neurons, thus 
4 bits/cells * 64 cells). 

The circuit has been evolved one hundred times for each of the four training 
sets (fig. 0. The morphogenetic system outperforms the direct coding, both 
when comparing the maximum fitness reached at a given generation and when 
comparing the number of runs that have reached the maximum fitness after 50 
generations. Table 0 reports the number of runs (on the 100 runs performed) 
where the maximum fitness of 20 is reached. Averaged over the four training 
sets, runs finding a maximum fitness using the morphogenetic coding are more 
than twice as frequent than when using the direct coding. 



6 Mobile Robot Controller 

A spiking circuit is evolved as a controller for a Khepera miniature autonomous 
robot. The objective is perform obstacle avoidance using the sensory informa- 
tions coming from the proximity sensors of the robot (fig. 0. Four sensory groups 
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Fig. 6. Evolution of maximum fitness for pattern recognition with the four training 
sets. 



of two cells are connected to the infrared sensors. The activity of two other cells 
are used to set the speeds of the wheels. 

The robot has a sensory-motor period of 100ms. During that period, the 
neurons on the circuit are updated 20 times and, according to the distance to 
the obstacles, either 0, 1 (the ’’low” activity), or both input neurons (’’low” and 
’’high” activity) of each sensory group receive a spike train of period 2 (one spike 
every two time steps) . At the end of the sensory-motor period, the speeds of the 
wheels are updated and the proximity sensors are read to compute spike trains 
for the next sensory-motor cycle. Some amount of noise is introduced in the spike 
trains. The cells ML and MR control the speed of the wheels in a way which is 
inversely proportional to their activity. This allows the robot to move forward 
when no obstacles are sensed and thus when there is potentially no activity in 
the network. A minimum activity of the neuron sets the speed of the wheel to 
-1-80 mm/s. A maximum activity sets the speed to -80 mm/s. The speed of the 
wheels scales linearly in between. 

The spiking neuron model used here is the same as used in the pattern 
recognition experiment (section with the addition of random variations in 
the threshold value to avoid locked oscillations during the 100 ms sensory-motor 
period. For each neuron and at each time step the threshold value is incremented 
or decremented by 1 with a probability of 5%. 

The fitness of the robot is measured on two tests of 30 seconds in a rectangular 
arena (40x65 cm) . It is the average of the fitness computed at each sensory-motor 
step using the following equation [E]: f = v ■ {1 — Av) • (1 — p) , where v is the 
average speed of the two motors, Av is the absolute value of the difference of 
speed of the motors, and p is the activity of the most active sensor {y, Av and 
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Fig. 7. The Khepera robot (left) and the 
neural controller (right). The Khepera has 
8 proximity sensors. They are grouped by 
two, taking value of the most active sen- 
sors, to have 4 sensory inputs SI to S4. 
The circuit receives SI to S4 as sensory 
inputs. Each input is coded on two neu- 
rons. The cells ML and MR control the 
speed of the wheels according to their ac- 
tivity. 



Ms<imum fitness for obstacle avoidance (aug of 7 tuns) 




Fig. 8. Evolution of the maximum fitness 
in the obstacle avoidance experiment (av- 
erage of 7 runs on a physical robot). 



p are in the range [0;1]). The three parts of this function aim to 1) maximize 
the speed of the robot, 2) minimize the rotation and 3) maximize the distance 
to the obstacles. 

Seven runs were performed to compare the same morphogenetic system and 
direct coding as those used in the pattern recognition experiments. The morpho- 
genetic system performs better than the direct coding (fig. 0: it clearly achieves 
a higher maximum fitness than the direct coding. Moreover, after 15 genera- 
tions, only three runs managed to find individuals displaying obstacle avoidance 
behavior with direct coding, whereas with the morphogenetic system individu- 
als were found displaying this behavior in all seven runs. It is thought that the 
difference in performance may come from some characteristics that individuals 
generated with the morphogenetic system possess but which are harder to get 
with the direct coding. A preliminary analysis is discussed below. 

7 Analysis 

The better performance of the morphogenetic system over the direct genetic 
encoding may be due to differences in characteristics of the fitness landscape. 
Ruggedness is often linked to the difficulty of search when genetic algorithms are 
used [0. To investigate the level of ruggedness, random walks using the mutation 
operator have been performed in the pattern recognition experiment. For 600 
individuals of maximum fitness, 5 random walks of 10 steps (10 consecutive 
applications of the mutation operator) have been performed on circuits evolved 
using the morphogenetic system and the direct genetic encoding. The fitness 
averaged on the 3000 random walks is plotted against the number of application 
of the mutation operator in fig0 The fitness drops faster with the morphogenetic 
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Random uvaks. siaiting from points of max fitness 




Table 2. Maximum fitness and standard 
deviation of the fitness in a sample of 5000 
randomly generated individuals. 





Direct 


Morph. 


Training set 


Max 


Std dev 


Max 


Std dev 


A+B 


11.7 


0.77 


11.92 


0.88 


A+C 


10.38 


0.74 


12.14 


1.08 


A+D 


12.22 


0.64 


12.1 


0.85 


A+E 


10.58 


0.95 


12.28 


1.03 



Fig. 9. Comparison of the random walks. 
The fitness drops faster with the morpho- 
genetic system when going farther away 
from points of maximum fitness. This 
seems to indicate that the fitness land- 
scape is more rugged with the morpho- 
genetic system. 



system when moving away from a point of maximum fitness, which may imply 
a more rugged fitness landscape. The better performance of the morphogenetic 
system thus cannot be explained by having a smoother fitness landscape. 

We also measured the fitness values of 5000 randomly generated individuals 
using the morphogenetic system and the direct genetic encoding. The statistics 
of table EJfor the pattern recognition task show that the morphogenetic system 
tends to display higher values and higher standard deviation of the fitness dis- 
tribution. Although the differences are small and may not be significative, the 
higher fitness variability may explain why the morphogenetic system can gen- 
erate individuals of the same fitness faster than direct genetic encoding. The 
reason why morphogenetic systems also generate individuals with higher fitness 
is still not clear. We speculate that the difference can be explained by structural 
properties of the circuits generated by the two genetic encodings. For exam- 
ple, in the case of the robotic experiments, we noticed that the morphogenetic 
system generated very easily large patches of interconnected excitatory neurons 
that link sensors to motor neurons, causing a reversal of wheel speeds when the 
robot approaches an obstacle. Instead, direct genetic encodings take more time 
to set all the redundant cells to suitable values. However, these data are only 
preliminary require further statistical investigations. 



8 Conclusions 

We have presented a simple morphogenetic system and compared it to a direct 
genotype-to-phenotype mapping. The morphogenetic system uses a simple sig- 



A Morphogenetic Evolutionary System 163 



nailing and expression mechanism, which is inspired by the mechanisms of gene 
expression and cell differentiation of living beings. Several experiments were per- 
formed and the morphogenetic system outperformed a direct coding, both when 
comparing the maximum fitness reached at a given generation and when com- 
paring the number of runs reaching the maximum fitness. The morphogenetic 
system, designed to be simple (hardware implementation), and its good perfor- 
mance make it well suited for use in the POEtic circuit. Further experiments 
need to be done to see how the morphogenetic system scales to networks of larger 
size. Also, further analysis are needed to explain the better performance of the 
morphogenetic system. As the mechanism is inspired upon biology, it may be 
that knowledge from that field will help to shed some light on this topic. 

However, the real strength of the morphogenetic system is yet to be realized. 
It resides in its capacity to handle run-time dynamically reconfigurable circuits, 
e.g. by adding or removing diffusers, or by expressing the functionality of newly 
inserted cells in the circuit. Those aspects must still be explored, however they 
may pave the way toward dynamically reconfigurable electronic circuits that re- 
organize when a sensor fails or when new sensors or actuators are added to the 
system. In the future we also plan to extend the system by allowing environ- 
mental interactions while the circuit evolves, as well as including the evolution 
of functions that each cell can take, which, for the sake of simplicity, here have 
been limited to a predefined set. 
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Abstract. Vertebrate and most invertebrate organisms interact with their envi- 
ronment through processes of adaptation and learning. Such processes are gen- 
erally controlled by complex networks of nerve cells, or neurons, and their in- 
teractions. Neurons are characterized by all-or-none discharges - the spikes - 
and the time series corresponding to the sequences of the discharges - the spike 
trains - carry most of the information used for intercellular communication. 
This paper describes biologically inspired spiking neural network models suit- 
able for digital hardware implementation. We consider bio-realism, hardware 
friendliness, and performance as factors which influence the ability of these 
models to integrate into a flexible computational substrate inspired by evolu- 
tionary, developmental and learning aspects of living organisms. Both software 
and hardware simulations have been used to assess and compare the different 
models to determine the most suitable spiking neural network model. 



1 Introduction 

Simple spiking neuron models are well suited to hardware implementation. The 
particular aspects which are hardware friendly include the ability to translate the 
output of spiking models to a 1-bit digital representation, and the ability to 
completely specify the models with a few key parameters. Here we describe some 
recent research into Spiking Neural Networks (SNN) which can be realised using the 
POEtic tissue: a flexible computational substrate inspired by evolutionary 

(phylogenesis), developmental (ontogenesis), and learning (epigenesis) aspects of 
living organisms. The POEtic tissue follows a 2-D cellular structure consisting of 
three layers: a genotype plane; a configuration plane; and a phenotype plane, whose 
combination allows multi-cellular electronic organisms to be easily evolved and 
grown. For the case where the tissue is required to learn, initially the phenotype will 
be a SNN although other learning organisms may also be developed on the tissue. A 
more detailed description of the POEtic project can be found in [1]. 

Several bio-inspired models capable of endowing the tissue with the ability to 
learn have been studied [2], ranging from a complex biophysical representation based 
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on modified Hodgkin & Huxley dynamics to a greatly simplified Integrate to thresh- 
old & Fire model. To balance the conflicting requirements of hardware resources and 
bio-realism, we concentrate on an intermediate level of complexity [3]. 

Biophysical and computational aspects of spiking neuron models have been exten- 
sively studied [4]. Many SNN models now also exist which are suitable for imple- 
mentation in digital hardware, e.g., [5, 6], with hardware platforms continually being 
developed for their real-time parallel simulation, e.g., [7, 8]. However, to our knowl- 
edge, no SNN models exist which have been successfully developed for their imple- 
mentation on a digital hardware device which are suitable for combination with onto- 
genetic [9] and phylogenetic [10] methods. 



2 Model Description 

In this section we describe the three principle aspects of SNN models. These are the 
method used in modelling the internal membrane dynamics of an individual neuron; 
the learning dynamic (which in this case is to be built-in to the individual neuron); 
and the topological rules by which networks of these neuronal elements are con- 
nected. 



2.1 Membrane Dynamics 

The general form of linear differential equation for a spiking neuron is given in equa- 
tion (1) 

= -G„ {V, it) - ) - ! G, (f)(y. it) - yj (D 

at j^Q 

Where V/ 1) is the membrane potential of cell i at time f, and are the membrane 
capacitance and conductance, respectively, is the resting potential of the cell, and 
Gj( t) and V. are the time dependent conductance change and reversal potential for the 
n synaptic inputs. The membrane potential is compared with a threshold value to de- 
termine if the neuron outputs a spike. After each output spike the membrane potential 
is instantaneously reset to the resting potential, and no further output spikes are gen- 
erated for a fixed refractory period. Equation 1 can be rearranged to give 

= + ( 2 ) 

dt 

where /, (?) = (g„ -I- I G^ (f ))/ C„ and h. (?) = + I G^ {t)V ^ )/ C„ , the solution 

of which can be approximated, for a time step of Zl? as 

y, (? + A?) = (y, (?) - /*, (?)//, (?)) exp(- /, (?)A?) + /i, (?)//, (?) (3) 

The quantity can be regarded as the time constant of the cell. Ignoring the ef- 
fects of the synaptic inputs, this can be approximated by the constant = CJG^. 
Thus 
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v,(»+A») = r„„+r,.(v,(»)-v„„)+(i-r,.)! /„«) (4> 

where = exp(- Attx^ and I I (t) approximates the change in membrane potential 

resulting from synaptic (and any other) input currents. We refer to as the kinetic 
membrane constant. This allows the exponential decay of the membrane to be ap- 
proximated using a simple algebraic expression. An alternative approach is described 
in section 4, dealing with hardware aspects. A further simplification results from set- 
ting to zero to give the final form of the spiking neuron model. The firing condi- 
tion is 



if Vi(t) > 0 then S^{t) = 1 
if Vi{t) < 0 then S- (t) = 0 

where S/t) is the binary valued output of the model, and ^is the (fixed) firing thresh- 
old of the cell, a value of 1 for the variable S- signals the presence of an output spike. 
The update equation used in combination with the firing condition is 

if S- (t) = 1 then V. (t + At) = 0 

ifS,«) = OthenV',« + A<)=*.V',(0+(l-tJ! /„«) 



2.2 Learning Dynamics 

Although several attempts at applying conventional artificial neural network learning 
algorithms (e.g., backpropagation) to SNNs exist, e.g., [11], for the purposes of the 
POEtic project we required a simple unsupervised learning method which would al- 
low the adaptation of weights based on pre- and post-synaptic activity. A recent dis- 
cussion of several such spike-timing dependent synaptic plasticity (STDP) learning 
rules can be found in [12]. Whilst these rules offer a fast and unsupervised adaptation 
of synaptic weights, the resulting networks are often prone to memory loss in noisy 
environments. However, a method recently proposed by Fusi [13, 14] aims to address 
this problem through discretising the synaptic weight variable. It has also been shown 
via simulation that this method reduces the computational load per neuron for models 
using plastic synaptic weights [15, 16]. An added advantage of this technique, from 
our perspective, is its increased suitability for digital hardware implementation. The 
learning rule proposed by Fusi uses an internal learning state variable for each synap- 
tic input which functions as a short term memory, the value of this variable is used to 
discretise the synaptic weight into a predetermined number of states. We have chosen 
to modify the STDP method used by Fusi for updating the synaptic weight, so that 
the state variable is increased if the post-synaptic spike occurs after the pre-synaptic 
spike, and is decreased if the opposite temporal relation holds. 

Each synapse has two main variables for the connection from cell j to cell i, these 
are an activation variable A., and a learning state Lj^. The activation variable depends 
on the class of cells i and j, i.e. inhibitory or excitatory. The synapses of excitatory to 
excitatory neurons have activation variables which can vary between a chosen num- 
ber of discrete states, i.e., [0, 1, 2, 4], all others have a constant A^^.=l. 
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The learning state, L.j, is governed by the following equation: 

Ly (f + 1) = X Ly (t) + {yDj (t ) X S, (t))- { yD_ (f) X (f)) (7) 

where is a kinetic activation constant used to approximate an exponential decay 
with time constant of (f ) . The YD variables define a time window (with expo- 
nentially decaying effect) after each pre- and post-synaptic spike where modification 
of the learning state, Ly (t) , is possible. For post-synaptic neuron i, YD is modeled as: 

if5.(t) = l YD,{t + \) = YD^^ 
if5.(t) = 0 TA(t + l) = k,_xro,(t) 

where is a kinetic learning constant approximating an exponential decay with 

time constant . From equation 7, we can see that in the absence of any spiking 
activity, the learning state is a continually decreasing function. However, if the pre- 
synaptic cell fires just before a post-synaptic cell the learning state will increase by 
YDj{t) , whereas if the pre-synaptic cell fires just after the post-synaptic cell, the 

learning state will be decreased by YD. it) . The respective value of the YD variables 

reflects the level of coincidence of the pre- and post-synaptic firings as defined by 
equation 8. The learning state is used to determine the discrete level of activation of 
the synapse The number of discrete activation states is a fixed parameter of 

the model, and may be two [0,1], three [0,1,2], or more. Note that, according to equa- 
tion 7, if there is no activity at the synapse then Ly drifts back towards 0 (whether it's 
value is positive or negative). If Ly reaches a positive threshold {Ly > L^y) and Ay is not 
already equal to A , then A . is incremented and the value of L . is reset to L. - (2 x 
L^y). Likewise, when Lj. reaches a negative threshold (Ly < - L^y) and Ay is not already 
equal to 0, then Ay is decremented and the value of Ly is reset to Ly H- (2 x L^y). The 
input ly in equation 4 is then given by 

(9) 

where VF is a constant multiplier used for generating the final weight added to the 
membrane potential on receiving a given input. 

2.3 Network Topology 

In this section, we consider the method of network creation. While evolutionary tech- 
niques [10] may be applied to determine the parameters of any part of this model, on- 
togenesis [9] is essentially concerned with the topology. For our purposes this process 
has been automated using a set of topological rules. These rules define the size of the 
network, the method and depth of connectivity and the neuronal class proportion. In 
the experiments which follow, the network contained 50 x 50 neurons N=[0,2499], 
80% of which were excitatory, leaving 20% inhibitory, in line with the generally ac- 
cepted ratio of the neo-cortex [17]. The excitatory cells may connect to other cells 
with differing synaptic weight values (see 2.2 above). In the simulations described 
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below, each cell makes connections to other cells within a 5 x 5 neighbourhood, i.e., 
24 other cells. 



3 Software Simulation Results 

In this section, we explore the potential of the SNN model described previously to 
mimic experimental results of neuroscientific studies. We first study the behaviour of 
the network with static synaptic weights, then with the learning activated we look at 
its ability to perform a visual discrimination task. 

The network was first initialised (f=0) where all neuronal states are set to zero 
(5;(0) = 0)and all membrane levels are set to zero(V^.(0) = O) . A brief pulse stim- 
uli, with sufficient amplitude to fire the neuron, was then input to all excitatory cells 
in the network. Sustained activity was then observed within the network, achieving 
low average network activities (e.g., 1 spike every 96 sampling intervals) without 
causing global oscillations. A trace recorded from one of the output cells can be seen 
in figure 1 . This simulation without learning demonstrates that a simple network can 
hold information indefinitely through persistent activity. 











Fig. 1. A five second recording of the activity of a single cell with a mean interval of 40 sam- 
ples. 



The next stage is then to allow learning to investigate the possibility of exploiting 
different activity patterns as memory depending on the input stimuli. We apply the 
learning rule as described earlier to the same network used in the previous experi- 
ment. The network is initialised at t=0 as before. As well as the input stimuli, this 
time a low-level Gaussian noise input is also applied at each time step to every cell in 
the network. This stimuli was designed to simulate sinusoidal gratings moving 
horizontally at different speeds over a visual receptive field. Training of the network 
then involved moving the stimulus across the network from left to right at speed=l 
(i.e., 50 neurons per second) for a period of 10 seconds. 

Figure 2 shows the results following training for applied stimulus speeds of 1 and 
2, respectively. In addition to the response of a single excitatory neuron a “local” ac- 
tivity was also measured. The local activity corresponds to the average activity in a 
narrow strip perpendicular to the movement of the grating. These results demonstrate 
that the network has been successfully trained to discriminate direction of a moving 
stimulus across the network. 

The effect of reducing the strength of the stimulus is illustrated in figure 3. Here 
only the forward moving stimulus triggered any response, relying on the presence of 
additional input noise to provoke the discharge of several neurons. This result sug- 
gests that the trained network is also sensitive to an input stimulus of varying ampli- 
tude. 
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Fig. 2. The response of a single excitatory cell (top trace) and local activity (bottom trace) for 
input stimuli moving at two speeds, in both forward and reverse directions, across the network. 
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Fig. 3. Recordings from a single cell and associated local activity as a result of applying low 
level stimulation (original amplitude/4) moving at a faster speed (200 neurons/sec). 



4 Digital Hardware Implementation 

Several aspects of the previously described model remain under development. We 
present here the latest implementation of several functional units following the speci- 
fications defined by the theoretical model. 

A block diagram of a neuron with two inputs is shown in figure 4. From this dia- 
gram it can be seen that the inputs are added and the result is stored in a register. At 
the following clock event this result is added, using the second adder, to the current 
membrane potential fed back from the decay function block. The result of this addi- 
tion gives the new membrane potential now found at the output of the decay function 
block. However, if the result of the first addition is zero ,i.e., no inputs are present, 
then the membrane potential is not updated and instead the decay function becomes 
active implementing the leaky membrane. 

Equation 4 represents a common approach to approximating exponential decay. 
However, the decay function currently used is an approach more suited to hardware 
implementation approximating exponential decay through the control of two parame- 
ters M and STEP. With the membrane potential held in a shift register, the algorithm 
is implemented as follows: 
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m=Most_signif icant_l (membrane) +1 (i) 

m=min(m,M)-l (ii) 

delta_mem=m>>membrane (iii) 

iSTEP= (M-m) *2®™^ (iv) 



• The first step is to determine the Most Significant Bit (MSB) of the membrane po- 
tential which is stored as the variable, m (i). 

• If the MSB is set then the membrane potential is decreased by one and steps (ii) 
and (iii) are bypassed. If the MSB is not set then the minimum of M and the MSB 
of the membrane potential less one is calculated (ii). 

• The shift register holding the membrane potential is then shifted down by m bits to 
give delta_mem (iii). 

• Finally, the time delay (iSTEP) until implementing the new value is calculated 
(iv). 




Fig. 4. A block diagram representation of the neuron model implementation prototype for digi- 
tal hardware 



The output of the decay function block is continually compared with the firing 
threshold. If the firing threshold is surpassed then the output of the comparator goes 
high, producing an output spike. The output is also fed back resetting the decaying 
membrane potential to implement an approximate refractory period. 

The second part of the implementation consists in the definition of a memory for 
each synapse which works according to the learning rules explained earlier. Referring 
to figure 5, the counter holds a variable ‘xmem’. Upon receiving a spike at the synap- 
tic input a new value of ‘xmem’ is loaded into the counter. If the membrane potential 
of the neuron is greater than mem_threshoId (note: not the firing threshold), 
xmem(t+l)=xmem(t)+ Ax, otherwise xmem(t+l )=xmem(t)- Ax. However, if no spike 
is present at the synaptic input the counter increases or decreases its value depending 
upon its current value i.e. if xmem is in the range [0,7] it will decrease to zero other- 
wise it will be increased to its maximum value of 15. 

This implementation represents a simplified version of the methods discussed in 
section 2.2 where four separate activation states were described. Here, xmem repre- 
sents the learning variable L.. where xmem=[0,7] and xmem=[8,15] represent the two 
activation states Agg=0 and Agg=l respectively. Simple methods for implementing fur- 
ther activation states are currently under investigation. 
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Simulations of this implementation have also been completed showing its correct 
operation governed by the theoretical model, for more details see [2]. 



membrane 




Fig. 5. A block diagram representation of the digital circuit used to implement synaptic weight 
storage and adaptation 



5 Discussion 

The model presented has been developed following stringent guidelines, tested first 
using a software model and finally presented as a hardware implementation. 

Neurophysiological studies of neuronal correlates of working (short-term) memory 
show that the maintenance of a stimulus in working memory is accompanied by per- 
sistent (enhanced relative to spontaneous activity) activity in a selective subset of 
neurons. A long-standing hypothesis has been that persistent activity is a collective 
property of a network, and that it is maintained by recurrent excitatory feedback [18]. 

It is difficult to achieve sustained activity without high firing rates. In a network of 
the type we have envisaged, depending on appropriately selected synaptic strengths, 
the activity obtained consists generally of synchronous oscillations. Only if external 
input (noise) is applied other sustained patterns of activity will appear. 

The simulations that were made indicate that various patterns of sustained activity 
without external input can be obtained if some modifications are made to the typical 
integrate-and-fire model. Our simulation results do not represent an attempt at an ex- 
haustive analysis of the model but an exemplification of some possibilities for uses of 
the model once combined with the ontogenetic and phylogenetic mechanisms. 

The theoretical model and simulation results presented have been used to develop 
a prototype for digital hardware implementation. This implementation is currently 
under development and once complete should fully respect the theoretical model. 
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Abstract. A simple robot control system is used to demonstrate bio- 
inspired fault tolerance techniques; a multi-layered hardware immune 
system is used for fault detection and an embryonic array for fault 
avoidance. The acquired layer of the immune system monitors system 
behaviour for unusual activity (normal behaviour is learnt during an 
unsupervised teaching period). An non-learning innate layer is then em- 
ployed to localise the fault if possible. The embryonic array allows sim- 
ple, robust reconfiguration to avoid the fault; as many faults as spare 
cells can be accommodated. The complete system, including the learn- 
ing algorithm, is implemented on a Virtex FPGA. Results showing the 
appropriate response to different types of faults are given. 



1 Introduction 

Through millions of years of refinement biology has produced many living crea- 
tures that are remarkably fault tolerant. They can survive injury, damage, wear 
and tear, and are under continual attack from other living entities in the form of 
infectious pathogens. They make use of multiple mechanisms, or multiple layers, 
to generate this fault tolerance. This paper details a demonstration hardware 
system that is inspired by some of these system to provide fault tolerance. A 
multi-layered hardware artificial immune system that learns normal behaviour is 
coupled with an embryonic array to provide fault detection and fault avoidance. 
A simple control system, to perform object avoidance for robot control, is used 
to demonstrate the system. 

The embryonic array is a homogeneous array of logic units (called cells) that 
use their location within the array to extract appropriate configuration data. 
Each cell contains all the configuration details of all cells and hence can perform 
any cell’s function as required. 

This is part of the POEtic project which aims to produce a circuit that 
combines other forms of bio-inspired techniques [26]. Ultimately the POEtic 
device will be produced in silicon using ASIC (Application Specific Integrated 
Circuit) fabrication. However, this paper details the initial stages of just the 
immune system upon an embryonic array which has been implemented on a 
Xilinx Virtex Field Programmable Gate Array (FPGA). 
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The paper provides a background to artificial immune systems, including 
some details of immunotronics, and embryonics in section 2. The system is de- 
scribed in section 3 which is followed by the results in section 4. Some discussions 
on further work and comments are given in section 5 before the conclusions are 
given in section 6. 



2 Background 

2.1 Artificial Immune Systems 

The immune system found in higher organisms is a multi-layered, distributed 
system that is robust and can identify numerous pathogens and other harmful 
effects. Many of the properties found in such a system would be most advanta- 
geous in many computer and other systems. Artificial immune systems do just 
this. 

One of the most researched algorithms is the negative selection algorithm. 
Developed by Forrest et al. [18], the negative selection algorithm is based upon 
the detection of non-self from self, as found within the immune system. Various 
immune cell types (such as lymphocytes) have receptors that allow them to bind 
to specific sets of proteins. The maturation of each lymphocyte cell involves 
the presentation of proteins that are naturally present within the body (self). 
Lymphocytes that bind with them are destroyed. Hence, when released within 
the body, binding to a protein indicates it is non-self and may be a harmful 
pathogen. See [1, 14,21,23] for more details of immune systems. 

Forrest uses a string to represent the system’s state; partial matching of these 
strings and a detector string is used to distinguish between self and non-self. A 
set of detector strings is generated such that they do not match with the normally 
occurring states - they only match with non-self. Hence, a detector match gives 
an indication that some event has occurred. 

Artificial immune systems have been applied to many applications, those that 
make use of the negative selection algorithm include computer security [18], com- 
puter intrusion [16, 17], network security [9, 19, 22], pattern recognition [11, 20], 
industrial anomaly detection [12, 13] and hardware fault detection (immunotron- 
ics) [4-7]. Comprehensive surveys and reviews can be found in [10, 14, 15]. 

Avizienis [2] discusses, with no specific detail, the concept of using an immune 
inspired process that surrounds a processor to provide fault tolerance. Separate 
and distributed hardware is suggested to isolate the fault detection from the 
processor. 



2.2 Immunotronics 

All the applications listed are software implementations of an artificial immune 
system, with the exception of immunotronics (immune -I- electronics). This was 
devised by Bradley and Tyrrell [4-7]. It makes use of a negative selection al- 
gorithm to identify faults within a hardware circuit, specifically a finite state 
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machine. The current state, next state and the current inputs are used to define 
the current transition of the machine. Some of the transitions are not present 
in normal, error free operation and so are considered as non-self. A 4 bit BCD 
counter implemented upon a Xilinx Virtex FPGA was immunised. The detector 
set was generated offline and implemented using a content addressable memory 
(CAM) [31]. Partial matching between the detectors and the current state, based 
upon the number of contiguous bits, was employed. Experimental results showed 
that a fault coverage of 93% could be achieved with 103 detectors. This is very 
impressive but requires a CAM of 103 words, each 10 bits in width. This is large 
in size compared to the counter that was being immunised. Only fault detection 
has been considered in current implementations. 

The size of the detectors is not dependent on the complexity of the circuit 
being immunised. It might be considered that the counter is the wrong granular- 
ity and would be more appropriate to implement immunotronics at a system, or 
sub-system level. It would not now be used to identify every error, but unusual 
behaviours that indicate something erroneous has occurred. If a robot controller 
is considered, the immune system would provide monitoring for situations that 
should not occur. This may include situations such as a robot that is heading 
for an object. 



2.3 Embryonics 

All multi-cellular organisms start life as a single cell. This cell divides repeatedly 
to generate numerous identical copies of itself. Embryonics (embryo -I- electron- 
ics) is inspired by the cloning and differentiation of cells within multi-cellular 
organisms to generate electronic circuits with some of the properties of such 
organisms [25,27,28,30]. An embryonic array consists of a homogeneous array 
of cells, each containing the full specification of the device. The coordinate of 
each cell is calculated dependent upon its neighbours and is used to define its 
funcionality. 

Errors within the array are accommodated by killing the faulty cell. The 
routing becomes transparent and the coordinate system no longer increments, 
thus the next cell takes over the functionality of the faulty cell. The array con- 
tains spare cells that are not utilised until a fault occurs. To maintain cellular 
alignment, removing the row or column that contains the error is typical. Fault 
avoidance has also been performed at sub-cell level. As many faults as there are 
spare cell columns or rows can be accommodated. 

3 System Description 

3.1 Overview 

The system implemented uses a number of layers to provide fault tolerance 
using the immune system for inspiration. The top level is a negative selection 
algorithm that learns and is analogous to the acquired immune system. This can 
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work at either a system or subsystem level and monitors the state of the system 
for non-self. When non-self is identified the system could be relocated by killing 
the cells within the embryonic array. For larger systems that use a number of 
cells, a second layer could be used to localise the fault. This could take many 
forms: a section of the device could implement a reduced functionality while the 
rest is tested. The functional section is then relocated to allow complete testing. 
Alternatively the device is placed in a safe state and taken offline while the 
testing takes place. Cells that have errors are identified and killed. These tests 
do not learn and hence have been labelled as the innate immune system which 
is present in biological systems. However, this is just an identifier and does not 
reflect the complex processes that occur in a biological innate system. 

A simple embryonic array has been implemented to test the immune system. 
A simple example application is used to demonstrate the system. This takes the 
form of the simplest of robot controllers which has three single bit inputs and two 
single bit outputs; the inputs represent the detection of an object to the robot’s 
left, front and right. The outputs represent signals to the robot’s left and right 
motors. Hence a signal on the left motor will turn the device to the right, both 
motors will cause it to travel forward and a signal on the right motor will turn 
the robot to the left. This is shown in figure 1. The controller is implemented 
within a single cell and a fault detected will result in that cell being killed. Hence 
an innate system cannot localise the fault any further and so is used to test the 
transparency mechanism. The nature of the controller is not important; in this 



Object Detection 
Left Centre Right 




Left Motor Right Motor 
Motor Drive 



Fig. 1. Test System 



demonstration it took the form of a simple lookup table which was driven by 
a linear feedback shift register (LFSR) to generate a random, complete set of 
inputs. When the outputs are considered with the inputs there are a number of 
states that would be considered as normal (self-states) and those that would be 
considered as abnormal (non-self). An example of non-self would be the detection 
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of an object ahead with both motors driving the robot into it. A period of fault 
free operation is used as a learning period where normal, self states are presented 
to the acquired layer. 



3.2 The Embryonic Array 

The embryonic array is based upon the arrays produced by Ortega and Tyrrell 
[24,25]. However, several changes are required. Together with its control cir- 
cuitry, each cell comprises an array of sub-units (called molecules). The func- 
tional unit of each molecule comprises a four input lookup table (LUT) and a 
D-type flip-flop. A three bit bus connects each molecule to its four nearest neigh- 
bours via a switch block. The output of the molecule’s LUT or flip-flop can also 
be connected to any of the switch block’s outputs. Each input of the LUT can 
be connected to any of the switch block’s inputs. 

A one-to-one genotype (the configuration data) to phenotype (the resultant 
hardware) mapping is employed. The configuration of each molecule is controlled 
by a configuration register; the coordinate of the cell selects the appropriate gene 
from within the genotype to fill this configuration register. 

The coordinates for each cell are calculated based upon its neighbours; each 
cell increments the address in the x and y axis which is then propagated on. 
Faults are accommodated by typically killing the entire column in which the 
faulty cell is present. A gene is selected that sets all the switch blocks to be 
transparent and the cell coordinates are no longer incremented. However, it is 
also possible to kill a row. This is necessary if the cell will not die appropriately 
or there is a fault present in the transparency of the cell. This allows a cell to 
be completely removed, no matter the degree of damage. More details can be 
found in [8] 



3.3 The Artificial Immune System 

The fault detection takes the form of a number of immune layers. The top-most 
layer is the implementation of a negative selection algorithm which is comple- 
mented by the innate layer. 



The Negative Selection Algorithm. Using the negative selection algorithm 
to identify unusual situations at system level and just considering the current 
state reduces the number of error states dramatically. In this example the number 
is reduced to such an extent that there are more self states than non-self states. 
This makes partial matching between detectors and the current state no longer 
possible, since the partial match is more likely to match with another self state 
than non-self state. The number of holes becomes too large to be practical, even 
with appropriate changes in the state representations [19]. The nature of the 
partial match within the negative selection algorithm (and in biological systems) 
is very powerful and, some may argue, a key component of the system. It allows a 
greatly reduced set of detectors to identify a wider range of pathogens. However, 
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when implemented in hardware there are a number of constraints and restrictions 
that prevent full advantage of the partial matching. The luxury of complex 
software algorithms is not available. The reduced number of non-self states, and 
the small number of bits in this example allows a complete detector set which is 
still very small. With a detector of only five bits a complete comparison can be 
implemented in a simple five input lookup table (implemented in three 4 input 
LUTs). This compares favourably with the typical CAM implementation used 
by Bradley [5] which required a LUT for every two bits of a detector and used 
over 100 detectors. The system still learns and identifies non-self states which is 
the fundamental aspect of the algorithm. 



Acquired immune 
layer 




states 



Error 

Signal 



Fig. 2. System Block-Diagram 



Figure 2 shows a block diagram of the controller and acquired immune sys- 
tem. The implementation used cells of 10 molecules; the controller, complete 
with LFSR for simulated object detection, is contained within one cell and the 
acquired immune system in another. 

The acquired layer learns during a teaching period. This is a simple process 
since all detectors are present. No random number generator or other complex 
elements are required and so it is practical in hardware. This is different to the 
Forrest implementation of the negative selection algorithm that first generates 
a number of detectors that are then destroyed if not appropriate. Here we start 
with the complete set of detectors that are then reduced. Initially all states are 
set to indicate a fault. During the learning period states that occur change the 
appropriate bit within the detector’s LUT. Hence the output generated from that 
state no longer indicates an error; the detector that matched self was destroyed. 
A learning line is fed to the immune cell to prevent it from killing the control 
cell during the learning process. Since the normal genotype is reloaded following 
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the innate testing process (see section 3.3), the changes to the detector set are 
changed within the genotype to restore them when the genotype is reloaded. 
This is performed externally to the embryonic array but still in hardware within 
the FPGA. 



The Innate Layer. In this demonstration implementation the innate immune 
layer is used to confirm that the fault detected by the acquired layer was not 
such that the cell could not become transparent. In a more complex system tests 
could be performed to localise the fault. This is achieved by reconfiguring the 
device and performing test patterns at a low level. The number and type of 
tests performed can generate an exhaustive test sequence in a manner similar to 
Sundararajan [29]. A large number of tests would be performed in parallel, with 
the repetition of the same gene in a number of the cells. 

A control process is necessary to carry out the reconfiguration and error 
checking. On the final POEtic device a microcontroller will be included within 
the device which will be utilised. However, this generates a large single point 
of failure. In an ultimate solution the controller would be implemented within 
the array itself which then offers all the fault tolerant protection provided. It 
should be noted that unlike many other FPGA fault tolerant techniques that 
use reconfiguration to locate and/or avoid faults (such as [3]) there is no complex 
reconfiguration calculation required; this is all achieved by the embryonic array. 
Hence, an integrated controller would be practical. In this example the control 
was provided within the FPGA, although not within the embryonic array. 

As previously described, this example application implements the robot con- 
troller in a single cell, any non-self detected causes that cell to be killed and 
its column made transparent. However, innate tests are performed to check the 
correct transparency of the device. Failure of this test results in the appropriate 
row being made transparent. Two tests are required; the second necessary to 
test the transparency of the cell used to generate the test signals in the first 
test. Details can be found in [8]. 

4 Results and Discussion 

Figure 3 shows a simulation of the system. At time 0 the configuration of the 
array is initiated with the controller and acquired immune cell for the learn- 
ing period. The system state is denoted as learning_config and is followed by a 
very brief learning period. Each state generated during this period changes the 
detector set data appropriately. The normal period now commences; first the 
configuration followed by normal running. An expansion is shown in figure 4. 
The lower traces show the output of the system; the pseudo-random sequence of 
the LFSR that simulates the object detection (a time shifted copy of the previ- 
ous trace) together with the motor drive signals. It can be seen that cell (0,0), 
that contains the controller, has an OK state and is using gene 0001. At time 
600/rs a molecule output within the controller has a stuck at one fault injected. 
This is detected (the error line goes high); the cell state goes to dead and gene 
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Fig. 3. Simulation of function stuck at fault 
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Fig. 4. Expanded section of normal running 



0000 (transparent) is selected. Tests 1 and 2 are then performed; this can be 
more readily seen in figure 3. The combined output of cell (0,0) can be seen on 
this trace. Once dead the output remains constant and cell (1,0) takes over the 
controller function. It should be noted that the cell references used here are the 
physical location, not the address calculated within the embryonic array. 

A similar process is shown in figure 5, however, the injected fault is within 
the routing between molecules and hence will not be avoided by making that 
column transparent. In this example the transparent cell is unused but the next 
stage of the work is to inject real inputs from a robot. These would typically 
enter the array on the left and pass through this cell, hence an error in the 
routing would block these signals. 

As before, cell (0,0) dies when the fault is detected but test 2 detects that 
the row can no longer become transparent and so the bottom row also becomes 
transparent (note that cell (1,0) has become transparent). Cell (1,1) now imple- 
ments the controller. Before any faults the states of the six cells can been seen in 
figure 6a. The fault injected in the molecule output results in the states shown in 
6b. Finally the fault in the routing between molecules results in the states shown 
in 6c. It should be noted that a dead cell attempts to become transparent to all 
signals. However, this may not be possible if a fault in the routing is present. 
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The circuit simulated above was synthesised and implemented upon a Xilinx 
Virtex 1000. Both routing and non-routing faults were injected and successfully 
detected and accommodated in a similar manner to the simulations. 

5 Further Comments 

5.1 Advanced Learning 

The current learning requires a period of fault free operation during which all 
the non-fault states are presented. Although there are applications where this is 
possible, this can become a non-trivial task in some complex systems. Biological 
immune systems are presented with a similar problem; not all self proteins are 
presented to the maturing lymphocytes. However, lymphocytes will only react 
if the immune system is stimulated by other indicators of a pathogen presence 
(including detection of damage by the innate system). Hence a learning period 
could be used to locate most self matching detectors. However, if a fault was 
repeatedly identified by a given detector but no fault was located by the innate 
tests then the detector should be removed. Care would have to be taken over 
temporary faults and design faults. Further work is required here. 

5.2 Temporal Characteristics 

Faults in the controller example are only identified if they cause the device 
to approach an obstacle. It could be possible that the fault forces it to follow a 
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circle, or spin, such that it never approaches an obstacle. A second, small detector 
set could consider the length of time a particular state is held, since the robot’s 
normal behaviour would not include being in a turn state for a protracted length 
of time. 



5.3 Independent Nature 

This paper details a hardware immune system that protects a simple hardware 
robot controller. However, the system is independent of the controller; it makes 
use of a totally different process than the controller. Although it is implemented 
within the same device this is not necessary. The controller (or any other suitable 
application) could be a complex hardware or software device implemented upon 
separate hardware or microprocessor; the hardware immune system can be added 
without interference. Replication of the detector set within an embryonic array 
would provide a robust fault detection system such as that envisaged by Avizienis 
[2] (see section 2.1). 

6 Conclusions 

A simple robot controller has been implemented on a commercial FPGA which 
makes use of biologically inspired layers to provide fault tolerance. An immune 
system that learns normal behaviour is used for fault detection which then makes 
use of an embryonic architecture to easily avoid the fault (fault recovery). 

Using the acquired immune system to monitor at a system level for unusual 
behaviour (rather than all faults) allows a very small detectors set to be used 
with an efficient unsupervised learning algorithm. The learning algorithm has 
been implemented in hardware. This should be compared to implementations of 
immunotronics by Bradley and Tyrrell which is the only other hardware imple- 
mentation of a negative selection algorithm. This requires a very large detector 
set which is generated on a host PC in software. Also their system only imple- 
ments fault detection. The use of the embryonic array detailed allows as many 
faults as there are spare cell columns and rows to be accommodated. 

The nature of the controller immunised is of little consequence since only the 
inputs and outputs are considered - the immune system is independent and can 
be added to any suitable system. 
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Abstract. The paper introduces a new method for the design of real- 
world applications of evolvable hardware using common FPGAs (Field 
Programmable Gate Arrays). In order to avoid “reconfiguration prob- 
lems” of current FPGAs a new virtual reconfigurable circuit, whose gran- 
ularity and configuration schema exactly fit to requirements of a given 
application, is designed on the top of an ordinary FPGA. As an example, 
a virtual reconfigurable circuit is constructed to speed up the software 
model, which was utilized for the evolutionary design of image operators. 



1 Introduction 

In recent years, researchers have recognised the essential difference between evo- 
lutionary circuit design and evolvable hardware [Q. 

In case of evolutionary circuit design a single circuit is evolved. An evolu- 
tionary algorithm is used only during design phase and thus it plays the role 
of a “designer” . While quality and implementation cost of the resulting circuit 
are main design objectives, the time needed to evolve a satisfactory circuit for 
a given application may not be critical. One could imagine that it could be eco- 
nomically profitable to evolve a single circuit for one month, provided that it 
can be reused for the next five years. Hence the design can be approached using 
a circuit simulator in software (that is, in extrinsic evolution) . 

On the other hand it can be supposed that a noticeable portion of a tar- 
get (embedded) system will be implemented in hardware in case of real-world 
applications of evolvable hardware (EHW). Such applications typically operate 
in time-varying environments. Then EHW ensures either adaptation (e.g. in 
robotics) or high performance computation (e.g. in image compression), which 
is unreachable using conventional approaches. In contrast to evolutionary cir- 
cuit design, the evolutionary algorithm is a part of a target system. However its 
role is rather to provide suffiient long-time mean performance (quality) of the 
evolved circuits than to find “innovative” circuits. Unlike evolutionary circuit 
design, the amount of resources (gates) needed to realize the final circuit is not 
usually important because some resources are always devoted for evolutionary 
purposes and may be used for free. 

While there exist some ASIC-based (perhaps already commercialized) real- 
world applications of EHW [0, implementations of FPGA-based real-world ap- 
plications of EHW remain practically unexplored. Although FPGAs are widely 
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accessible reconfigurable circuits (RC), the following points represent obstacles 
for their usage in EHW: (1) In current FPGAs the reconfiguration system is not 
sufficient for EHW because the partial reconfiguration is not fully supported (the 
XC6200 family is off the market now and the usage of JBits is too complicated 
[0) and FPGAs can only be reconfigured externally. (2) FPGAs are based on 
Gonfiguration Logic Blocks (GLBs) whose granularity is too fine for evolution 
of complex circuits. And these complex circuits are required in real-world ap- 
plications. Hence in addition to the problems mentioned previously, scalability 
problems of EHW have to be solved in real-world applications m- 

In this paper, we present a method for the design of real-world applications 
of EHW using common FPGAs. A new virtual RG, whose granularity and re- 
configuration schema exactly fit to the requirements of a given application, will 
be designed on the top of a common FPGA. It will enable the implementation 
of a target system in a low-cost, commercial off-the-shelf FPGA and to achieve 
adaptation to a changing environment. The method will be illustrated in the 
design of the virtual RG for an evolvable image pre-processing component. 

The design will be approached in two major steps: (1) Gircuit software sim- 
ulator will be employed during the evolutionary circuit design of some typical 
circuits from application domain in order to find “optimal” organization of the 
virtual RG and evolutionary algorithm. (2) The virtual RG and evolutionary al- 
gorithm will be designed and synthesized into an FPGA according to outcomes 
of step 1. Note that this paper deals only with design of the virtual RG. 

Section 2 introduces the problem domain and summarizes the results we 
obtained from the previous research. The implementation of the virtual RG is 
described in Section 3. Section 4 discusses the proposed approach. And finally, 
conclusions are given in Section 5. 

2 Evolutionary Image Operator Design 

In order to ensure the adaptability of a target embedded system (e.g. to a varying 
type of noise), we have to find such an organization of the RG and genetic 
operators, which will enable us to evolve various image operators and filters in a 
reasonable time. Every image operator will be considered as digital circuit of nine 
8bit inputs and a single 8bit output, which processes gray-scaled (8bits/pixel) 
images. As Fig. Eshows every pixel value of the filtered image is calculated using 
a corresponding pixel and its eight neighbors of the processed image. 

We approached the problem using Gartesian Genetic Programming (GGP) 
[Q. However after a number of experiments we claim that it is practically impos- 
sible to evolve a gate-level image operator of 72 input and 8 output bits, which 
is able to compete against conventional solutions. In order to allow the design 
of such complex circuits, we extended GGP to functional level Q. Instead of 
GLBs and I bit connection wires, Gonfigurable Functional Blocks (GFBs) and 
8bit datapaths are utilized (see Fig. [Qand Fig. 0. 

Similarly to conventional GGP, only very simple variant of the evolutionary 
algorithm is employed. Population size is 16. The initial population was gener- 
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0,l,C;2,5,D;4,8,B;7,6,E;9,ll,C;10,ll,D;x,x,x;12,3,D; ; (output) 24 



Fig. 1. A typical arrangement of an experiment for evolutionary design of image oper- 
ators. A sample chromosome (circuit) is uploaded and the active CFBs are marked. 



ated randomly, however only function “C” (see Fig. E) was used in some runs. In 
case of evolutionary circuit design, the evolution was typically stopped (1) when 
no improvement of the best fitness value occurs in the last 50000 generations, or 
(2) after 500000 generations. 

Only mutation of two randomly selected active CFBs is applied per circuit. 
Four individuals with the highest fitness are utilized as parents and their mutated 
versions build up the new population. In order to evolve a single filter (digital 
circuit), which suppresses a given type of noise, we need an original image to 
measure the fitness of the evolved filter. The generality of the evolved filters (i.e. 
whether the filter operates sufficiently also for other images of the same type of 
noise) is tested using a test set. 

The design objective is to minimize the difference between the corrupted 
image and the original image. We chose to measure mean difference per pixel 
since it is easy for hardware implementation. Let u denote a corrupted image 
and let v denote a filtered image. The original (uncorrupted) version of u will 
be denoted as w. The image size is K x K (K=256) pixels but only the area of 
254 X 254 pixels is considered because the pixel values at the borders are ignored 
and thus remain unfiltered. The fitness value of a candidate filter is obtained as 
follows: (1) the circuit simulator is configured using a candidate chromosome, 
(2) the circuit created is used to produce pixel values in the image v, and (3) 
fitness is calculated as 



K-2 K-2 

fitness = 255. (AT - 2)^ - ^ ^ \v{i,j) - w{i,j)\. 

i=i 

Now we can summarize CGP parameters that lead to efficient evolutionary 
design: RC consists of 2-input CFBs placed in array of 10 columns and 4 rows. 
An input of each CFB may be connected to circuit’s inputs or to the output of a 
CFB, which is placed in the previous two columns (i.e. CGP’s L-back parameter 
is L=2). After statistical analysis of the functions utilized in the evolved circuits. 
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we recognized that CFB should support functions listed in Fig. 0 Let us note 
that functions “C”, “E”, and “F” are the most important for successful evolution. 

A number of unique image filters for Gaussian, uniform, block-uniform and 
shot noise was reported [E0. These filters typically consist of 10-25 CFBs and 
show better quality in comparison with typical conventional filters (such as mean 
and median filters). Surprisingly in some cases the number of equivalent gates 
needed for implementation in FPGA XG4028XLA was also lower than in case of 
conventional filters. The very efficient edge detectors were also evolved using the 
same system. Furthermore, the system was able to produce sequences of very 
good filters in simulated time varying environment (changing types of noise) 
where only 20000 populations were generated per circuit mg. 

3 Virtual RC for Evolution of Image Operators 

The previous section dealt with an evolutionary image operator design prob- 
lem in which it was not any problem to spend 1 day with a single filter design 
on common PG. However if an application is operating in a dynamic environ- 
ment (e.g. an embedded system) it should be considered the software approach 
becomes slow. The speed at which the approach is able to evolve the efficient 
image operators automatically, determines a class of applications for which the 
approach is suitable for. The evolutionary image operator design has clearly 
demonstrated that very time consuming fitness calculation is the most painful 
problem of EHW. As soon as we are able to perform (at least) fitness calculation 
in hardware, we are also able to reduce design time substantially and potentially 
to support the continuous adaptation. 

The idea of virtual RG initially appeared in [E], and was developed in the pa- 
per inj. Some other authors have applied similar methods: Slorach and Sharman 
designed a simple fine-grained programmable logic array on top of an FPGA [ II 2 J 
and Torresen’s group approached a problem of context switching in this way too 
[ ITTT) . Macias has also implemented an 8 x 8 Gell Matrix using Xilinx Spartan- 
2 XG2S-200 FPGA. Other types of specialized platforms have been developed 
[ ll 411 bj . Note that none of these approaches deal with real-world applications of 
EHW operating in time- varying environments. 

3.1 Description of the Virtual RC 

The GGP instance (introduced in Section 2) is in fact implemented in an FPGA. 
As Fig. Qshows the RG under design operates with nine 8 bit inputs (10 - 18) and 
a single 8 bit output. It consists of 29 GFBs allocated in a grid of 8 columns and 

4 rows. The last column contains only a single GFB, which is always connected 
to circuit output (a connection of the circuit output is not evolved). Every GFB 
has two inputs and can be configured to perform one of 16 functions listed in 
Fig. 0 Examples of a hardware implementation of some typical functions are 
obvious from the same figure. 

Every GFB is also equipped with a register to support pipeline processing. 
A column of GFBs is considered as a single stage of the pipeline. In order to 
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Fig. 2. A reconfigurable device for the evolutionary design of image operators. 



synchronize CFBs with the processed inputs, we have to insert 72 bit (9x8 
bits) registers RO - R6 between neighboring columns. The CFBs’ inputs are not 
connected to circuit inputs directly, but they are connected to the appropriate 
registers. It is more economical in hardware to allow a CFB’s input is connected 
to one of 8 data sources than to one of 9 data sources because then only 3 address 
bits are needed for the routing multiplexer. Hence the following restrictions of 
conventional CGP (which are taken in account simultaneously) were introduced 
for this application to reduce the number of configuration bits: 

— An input of each CFB placed in an odd row may be only connected to 
circuit’s inputs 10 - 17 (via registers RO - R6). 

— An input of each CFB placed in an even row may be only connected to 
circuit’s inputs II - 18 (via registers RO ~ R6). 

— An input of each CFB placed in the first (the second) column may be only 
connected to circuit’s inputs via register RO (Rl, respectively). 

— An input of each CFB placed in column 3 (4, 5, 6 or 7) may be connected 
to circuit’s inputs via registers Rl and R2 (R2 and R3, R3 and R4, R4 and 
R5, R5 and R6, respectively) or to the output of a CFB, which is placed in 
some of the previous two columns. 

— An input of the last CFB may be only connected to the output of a CFB, 
which is placed in the previous two columns. 

Now we can derive the theoretical length of the configuration bitstream. Four 
bits are needed to determine the function in each CFB. Four bits also encode 
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Fig. 3. An implementation of functions 5 (sel=0101), A (1010), C (1100), and D (1101) 
that are provided together with other functions in each CFB. Note that >> is a shifter 
and is an adder with saturation. 



connection of every CFB’s input in columns 3-7. We will utilize only three 
bits to do the same with inputs in columns 1, 2, and 8. In total, the length is: 
9(3 + 3 + 4) + 20(4 + 4 + 4) = 330 bits. 

3.2 Routing Logic and Configuration Memory 

The routing circuits are constructed using multiplexers that are controlled via 
bits of the configuration memory. Sixteen 16-input multiplexers are needed for 
selection of CFB’s inputs. These inputs are usually taken either from the outputs 
of the eight CFBs in the previous two columns or from the previous two registers. 
Only 8-input multiplexers are required for CFBs of the first, the second, and the 
last columns since these CFBs utilize only eight possible inputs. 

The configuration memory is implemented using flip-flops that are available 
in CLBs. Thus a two output CLB is employed as a two bit memory. Then all 
outputs of CLBs utilized for the configuration memory are connected to multi- 
plexers that control the routing of CFBs. The configuration memory is depicted 
in Fig. 21 Four CFBs of a single column are configured simultaneously and their 
configuration is stored in a cnfBank configuration register. A column is selected 
via addr and a new configuration is uploaded using ghank configuration port and 
WR signal. Although the number of configuration bits needed for columns 1, 2 
and 8 is lower than 48 bits, the configuration memory employs 48 bits per column 
in our prototype (in order to make the design easier). Hence the configuration 
memory consists of 48 x 8 = 384 bits. 

The proposed configuration approach is a kind of partial reconfiguration that 
may be performed either form outside (if configuration signals are connected to 
circuit’s pins) or from some internal circuits. Obviously, it is impossible to destroy 
the chip when a randomly generated configuration is uploaded. 
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addr 



WR 

gbank 



bit 47 46 45 44 5 4 3 2 1 0 



Yconf Xconf sel Yconf Xconf sel Yconf Xconf sel Yconf Xconf sel 
4bits 4bits 4bits 4bits 4bits 4bits 4bits 4bits 4bits 4bits 4bits 4bits 



CFB4 CFB3 CFB2 CFBl 



Fig. 4. The configuration memory is divided into eight configuration banks cnfBank. 
Each of them is responsible for configuration of a single column of CFBs. 



3.3 Configuration Options 

The virtual RC was described using VHDL at structural level but some elemen- 
tary circuits (such as “Max” circuit) were defined behaviorally. Its functionality 
was tested in the Active HDL v.3.6 environment. 

Assume that the RC has been already configured and L = 1. Then the inputs 
(10 - 18) are processed via the array of CFBs, and simultaneously, the inputs go 
through the register array RO - R6. The first valid output is available in 9 clocks 
(a delay of RO plus 8 CFBs), and the other ones each per a clock. Hence K x K 
pixels can be processed in 9-F (AT x AT— 1) clocks. Maximal operational frequency 
is determined by a delay of a CFB and a delay of routing circuits (which is a 
single multiplexer). 

It takes 8 clocks to configure the entire RC at the beginning of the com- 
putation. However, the reconfiguration process can be hidden totally. We can 
configure the first stage in the same time in which the first input values are 
stored into RO. Then every next stage can be configured while the previous 
stage processes data. Hence such a pipelined approach requires only 1 clock to 
reconfigure the entire device as seen in Fig. 0 

The RC in fact supports A = 1 as well as L = 2 setting. However if A = 2 
is considered, every pixel must be sent to the RC two times to ensure synchro- 
nization of CFBs. Note that CFB’s inputs may be connected to registers in two 
different stages (i.e. to values registered at two different moments) but a CFB 
must operate with the inputs registered at the same moment to produce a correct 
output. This leads to decreasing of the maximal frequency to a half if A = 2. 
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Fig. 5. An example of simultaneous pipelined reconfiguration and execution of the RC 
under design. We can observe the outputs of CFBs placed in the first row {PEc*el) 
and the constant input 14 = 4, which goes via the register array {reg*outJf). In the first 
clock, 14 is stored into regOout^, and the first column of CFBs is configured using gbank, 
addr and wr. In the second clock, PEclel is calculated, 14 is stored into reglout^, and 
the second column of CFBs is configured. The entire RC is configured in 8 clocks and 
the first output {outval — OD) is available in the ninth clock. 



3.4 Implementation Costs 

In order to find out whether the design can be physically realized, we have 
tried to synthesize the RC into various FPGAs using Xilinx Integrated Software 
Environment. The results summarized in Table Q qualify us to claim that it is 
possible to construct virtual reconfigurable devices for real-world applications. 
Furthermore it seems that genetic operators and fitness evaluation can be placed 
on the same FPGA (e.g. 77% XCV2000E’s resources remain unused) and thus 
the entire application could be implemented in a single chip. 

While we reached a sufficient operational frequency (at least for L = 1), 
higher implementation costs are main problem of the approach. If a fixed (either 
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Table 1. The results of the synthesis into various Xilinx Virtex FPGAs. Note that 
no timing constrains were defined for the synthesis program. The maximal operational 
frequency fm is valid only for L — 1. 



Family 


Chip 


Optimized for 


Slices 


Slices [%] 


Equivalent gates 


fm [MHz] 


Virtex E 


XCV2000E 


speed 


4424 


23 


67937 


96.4 


Virtex 2 


XC2V1000 


speed 


4879 


95 


74357 


134.8 


Virtex 


XCVIOOO 


speed 


4489 


36 


68744 


86.7 


Virtex 


XCVIOOO 


area 


4085 


33 


63773 


81.3 



conventional or evolved) operator is considered, about 2000 equivalent gates 
are needed for its implementation. The virtual RC (which can be configured to 
implement the same operator) requires 30 times more equivalent gates. A single 
CFB costs 1009 equivalent gates. So far our design consists of 29 CFBs, 55% 
gates needed to realize the entire virtual RC are devoted for implementation of 
routing multiplexers, configuration memory and the register array RO - R6. 

3.5 Fitness Calculation in Hardware 

If L = 1 then the (virtual) RC can operate at fm = 134.8 MHz. Thanks to 
pipelining, the RC is able to process each pixel at 134.8 MHz too. But fur- 
thermore, the entire evolvable system can operate at this frequency. Recall that 
a deterministic selection mechanism is applied in the evolutionary algorithm. 
Hence a new candidate circuit can always be generated simultaneously with fit- 
ness calculation of another circuit. As soon as the fitness calculation takes a 
half millisecond (e.g. it is 0.48 ms for 254 x 254 pixels), there is plenty of time 
to generate a new candidate circuit if the genetic unit takes place in the same 
FPGA. Because the reconfiguration time of the virtual RC is practically zero, 
the time of evolution is determined as 

" fm 

where {K — 2)^ is the number of processed pixels, p denotes population size, and 
p is the number of generations. 

Fig. □ shows that pipelining of the RC can be extended to pipelining of 
the entire evolvable system. Every output value of the RC is compared with 
the corresponding pixel value of the original image. Then an absolute value of 
the difference is added to the content of an accumulator (which corresponds 
to fitness) in next clock. The circuit, which selects neighboring values of each 
pixel, is taken from m- FIFO is a RT-pixel data structure that may be easily 
implemented in an FPGA. The original image has to be synchronized in similar 
way as seen in Fig. Q 

We can estimate how much time is needed to evolve a single image operator. 
Assume the setting K = 256, p = 16, p = 160000, which requires 1 day on 
PentiumIH/800 MHz in average. If fm = 134.8 MHz, then a filter can be designed 
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Fig. 6. Pipelining of the RC can be extended to the fitness calculation. FIFO ensures 
that each output pixel value is calculated using the right input values. 



in tf. = 20.4 minutes in average using virtual RC. We obtained a speeding up 
more than 70. Some operators were evolved with p = 20000. If such a system 
is implemented in an FPGA, the adaptation time would be about te = 2.6 
minutes. Assume that only pixels and p = 10000 are sufficient. Then 

we obtain the adaptation time = 38.3 seconds which can be considered as a 
real-time adaptation in some applications. And finally, we can expect that it will 
be possible to implement in the future 16 virtual RCs in a single FPGA. Then 
all circuits of the population could be evaluated at the same time. Note that the 
fitness accumulator will have to be replicated 16 times, however only a single 
implementation of input FIFOs and registers is necessary. Such a system would 
be able to adapt to changing fitness function in 2.4 seconds. 

4 Discussion 

We did not implement the virtual RC in a physical FPGA yet. However, there 
is a company, which is interested to do it. Nevertheless the presented results 
show that an adaptive system for image pre-processing can be implemented in 
a common FPGA. If the approach is used only to speed up the evolutionary 
image operator design, then it should take several minutes to find a sufficient 
operator. If generality of the evolved operators is not required, it is possible to 
reduce the design time to tens of seconds. And that could be attractive for some 
real-time applications. Let us summarize the advantages of the proposed method 
for design of virtual reconfigurable devices: 
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— An architecture, which exactly fits to application requirements, can be devel- 
oped for a given problem. It is mainly important for EHW, since hardware 
can reflect the representation used in the evolutionary algorithm. 

— A style and granularity of reconfiguration may be defined according to the 
need of a given application. 

— A virtual RC is described in a platform independent way. Hence it can be 
synthesized into various target devices and with various constraints. 

— The process of a virtual RC design (i.e. its VHDL description) can be fully 
automatic. One can imagine a design environment, in which a designer is 
to press a button and a new virtual RC will be generated automatically 
according to the best parameters of RC and evolutionary algorithm found 
so far for a given EHW-based application. 

— A description of a virtual RC can be offered as an IP macro. 

— It is possible to implement very efficient context switching in Virtex FPGAs 
as it is evident in m- 

Higher implementation costs are the main disadvantages of the proposed 
method. Most resources available on an FPGA are wasted on configuration mem- 
ory and routing logic implementation. It seems that the approach is suitable only 
for coarse-grained architectures since the scaling problem is not so painful. The 
scalability problem can be approached by reduction of configuration options (e.g. 
only local connection could be enabled only) . The number of functions supported 
in a programmable element is not critical from a scalability viewpoint. 

5 Conclusions 

We have clarified that it is possible in principle to implement real-world appli- 
cations of EHW in common FPGAs. We are going to realize the proposed RC in 
Virtex FPGA and to apply the method for design of other applications of EHW. 

We have introduced a new level of abstraction to the hardware design. Virtual 
RCs are supposed to be beneficial not only for EHW, but also for the other 
research domains, e.g. reconfigurable computing. 
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Abstract. We have proposed and developed an evolvable reasoning 
hardware (ERHW) already. In this paper, we apply the ERHW to the 
gene finding task that is one of the most important tasks in the genome 
informatics. We evaluate the ERHW using human and other living or- 
ganisms’ DNAs. The ERHW shows high reasoning accuracy of 95.8% 
and high reasoning speed of 280ns. 

Keywords: evolving hardware system, evolutionary hardware design 
methodologies, genome informatics. 



1 Introduction 

Everyday masses of DNA (deoxyribose nucleic acid) of various living organisms 
are sequenced using DNA sequencers. A sequenced DNA fragment, which is a 
long array of 4 bases (A, G, C, and T), however does not make any sense by 
itself, i.e. one cannot obtain any information from the sequenced result as it 
is m- A gene which is a sequence of DNA that represents a fundamental 
unit of heredity, is hidden in the sequenced fragments. Finding a gene is one 
of the most important tasks in the genome science. The gene finding however 
is biologically a burdensome task, because it has to be done through multiple 
biological experiments requiring long processing time (Human genes occupy less 
than 3% of the whole DNA containing 3 billion bases, but most of them are still 
hidden remaining to be decoded) . Computational approach to high-precision and 
high-speed gene finding methods is thus in critical need in the genome science. 

We have proposed an evolvable reasoning hardware (ERHW) and developed 
its prototype already |3|. In this paper, we apply the developed ERHW to the 
gene-finding task and evaluate its performance using DNAs of human and other 
living organisms. 
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2 ERHW: Evolvable Reasoning Hardware 

In this section, we describe an outline of the ERHW. Details of the ERHW and 
its performance (reasoning accuracy and speed) are well explained in [EJ. Figure 
1 shows the fundamental configuration of the ERHW. In the ERHW, kernels 
shown as ..GA******GTA.., or ..GGGGT*GG***.. in the figure play the most 
important role in the reasoning. Each kernel is generated from a record in the 
database which we used for the reasoning. For example, suppose three DNA 
sequences (records) 

GAAAGTAGG 

GTAAGAAGG 

GGAAGTAGA 

are recorded in the same category (e.g., in the human- related DNA fragments) 
in a database. We apply our genetic algorithm-based learning method (described 
later in this section) to these records in order to make kernels 

GAAAG*AG* 

G*AAGAAG* 

G*AAG*AGA, 

where meaningless letters in that category are changed to don’t care(*) and the 
remaining letters show some rules. 

Reasoning is carried out as follows. A query (an unknown sequenced DNA 
fragment) is compared with each kernel and the number of kernels that match 
the query is counted in every category (Cq ~ Cn) in the counter shown in the 
figure. The category that has the largest number of matched kernels to the query 
is chosen (in the Maximum detector in the figure) and is inferred as the category 
that the query belongs to. 

This simple reasoning algorithm is an extension of the Parzen window method 
[0 that is well known in the pattern recognition field, so that its validity is 
guaranteed theoretically and its reasoning accuracy increases as the number of 
kernels increases. 

Furthermore, kernels can be simply implemented using fundamental combi- 
natorial circuits in the same way as counters and the maximum detector (details 
are shown later). 

The way to generate the kernel using the genetic algorithm (GA) is shown 
in Fig. 2. A population of N chromosomes is made for each record (DNA record 
in the database). Thus, if we use Rn records in the database, we make Rn 
populations and the following GA process is executed on each population. The 
chromosome is composed of I fields corresponding to the I letters in the record. 
Each field (locus) is filled with either operator (gene) or where 
changes the letter to don’t care while remains the letter. And a potential 
kernel (kernel candidate) is generated by operating the chromosome onto the 
corresponding record and is evaluated under the following fitness using all Rn 
records. The fcth chromosome in the population for jth record in the ith category 
evolves under the simple GA process with a fitness / of 
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Kernel 




Answer 

’=>C, 



Fig. 1. ERHW configuration. 

^ (^‘‘) = T^' 

where h increases as the number of records in the same category that match 
the kernel increases, while g increases as the number of records in the different 
category that match the kernel increases. Therefore, a chromosome of high fitness 
means that the kernel generated by the chromosome matches (includes) many 
records in the same category and mismatches (excludes) many records in the 
different category. This means that the kernel extracts some rule or a part of 
rule in that category. Finally in the GA process, the kernel is made up using 
the chromosome of the highest fitness by operating it onto the corresponding 
record. After the GA processes for all records, Rn kernels are generated and 
assembled separately as shown in Fig. 1. 

3 Splicing Site Prediction 

Genes that carry hereditary information are hiding in the DNA, and their tran- 
scription regions consist of two kinds of segments (sub-sequences) called exon and 
intron (Fig. 3). The exons are translated via mRNAs (messenger rebo-nucleic 
acids) to organize specific proteins, while the introns are discarded as meaning- 
less information. The exon-intron boundary, or splicing site is one of the most 
typical marks of gene. Therefore, finding splicing sites in the unknown DNA 
fragments is regarded as one of the best approaches to find gene. 

It is already well known that the left (5’ side) and the right (3’ side) borders 
of the intron are necessarily GT and AG, respectively [Q. However, no further 
certain rules are currently known about exon-intron boundaries. The fact that 5’ 
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Chromosome Candidates 




Fig. 2. GA process for kernel generation. 



and 3’ borders in introns are GT and AG respectively is the necessary condition 
(i.e., a boundary between an intron and an exon is necessarily GT or AG) but 
not the sufficient condition. A huge number of GTs and AGs are in the DNA 
sequences, and more than 99.99 % of them are not boundaries (Fig. 3). Thus, 
finding the true boundaries is one of the most difficult problems in the genome 
informatics. 

Whether each GT / AG is boundary or not is expected to be subject to some 
particular arrangements of bases around the boundaries, i.e., some rules, or fea- 
tures are expected to be hiding in the sub-sequence of 

XXXXXXXXGTXXXXXXXX or XXXXXXXXAGXXXXXXXX , 
where X represents A, G, G, or T (the number of Xs is discussed later). 

4 Prototype Development 

4.1 Genome Data and GA Operation 

We used the GenBank database^ which is one of the most well-known genome 
database, in the prototype development. A set of 11,000 records of GTs with 
their neighboring bases like XXXXGTXXXX (X represents a base of A, G, G, 
or T) was collected from more than 100 human-related DNA sequences (e.g., 
hemoglobin, interleukin, etc.), where 1,000 records out of 11,000 contained true 
boundaries (category ‘true’) and the remaining 10,000 records did not contain 
boundaries (category ‘false’). We used the 10% of the total records (i.e., 100 



^ http://www.ncbi.nih.gov/ Genbank/GenbankOverview.html 
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mRNA 



Protein 




Fig. 3. DNA structure and splicing site. 
Table 1. GA parameters. 



Population 


20 


Crossover Rate 


10% 


Mutation Rate 


0.1% 


Number of Generatiou 


100 



‘true’ records and 1,000 ‘false’ records) to construct the kernels in EHRW and the 
remaining 90% (i.e., 9,900 records) were used as queries (test data) to evaluate 
the reasoning accuracy. 

In the GA processing, we used 20 {=N) individuals (chromosomes) in each 
population. In every generation, one pair of chromosomes p and q were chosen 
randomly for the crossover operation (Fig. 4). And two points in the chromosome 
were also chosen randomly. The fields between the chosen points were exchanged 
between the two chromosome p and q each other. In the mutation operation, 
0.1% fields in all chromosomes were randomly selected, and operators (genes) 
in the fields were exchanged to the other one. In the selection operation, the 
elitist strategy was used. The GA operations above were iterated until the 100th 
generation. Then, the chromosome of the highest fitness was chosen in each 
population to make up a kernel. GA parameters are listed in Table 1. 
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Fig. 4. Crossover operation. Two-point crossover between two chromosomes is used in 
the operation. 



4.2 Hardware Configuration 

Up to this point, we have used don’t care for the letter (i.e., A, G, C, or T) 
in order to make our approach clearly (of course, this approach can be used 
practically). However, in the development we encoded the letters to the binary 
codes as A 1000, G 0100, G 0010, and T ^ 0001 in order to increase 
the search space. Figure 5 shows a kernel example implemented using basic logic 
gates. One base is represented by the four-input AND gate in which only one 
input is positive and the others are negative (the reverse is also applicable). 
Therefore, if we use a record of 10 bases adjacent to the GT, its kernel circuit 
has 40 (10 X 4) inputs totally. In the prototype development and its evaluation, 
we could not know how many bases adjacent to GT affect the boundary, so that 
we carried out the GA process changing the number of the bases adjacent to the 
GT. 

We developed a ERHW prototype (Fig. 6). The prototype contains seven 
FPGA chips (Xilinx XGV300-6GB432), data buffer memory chips, and some 
peripheral circuits in it. Four FPGAs out of seven are to implement kernels and 
counters, and one FPGA is to implement the maximum detector. Two remaining 
FPGAs (I/F chips) are for PG (host computer) interface circuits. The announced 
number of the system gates of XGV300 is 300k gates. Therefore, roughly 1,500k 
gates (300k x 5) can be used for the reasoning circuits, which are expected to 
be enough to implement many large practical reasoning applications. 

The four FPGAs are connected with one of the I/F FPGA via the wide data- 
broadcast bus of 256-bit. Input and output data latches (flip-flops) are placed 
in each FPGA. Net reasoning time for one query thus needs three clock periods. 
And four clock periods, which are required in the I/F FPGAs to connect with 
the PG are added to the net reasoning time. Gonsequently, the reasoning time for 
one query in the prototype requires seven clock periods totally. A clock oscillator 
of 25 MHz is currently used in the prototype. Therefore, the reasoning time for 
one query is 280 ns (l/25MHz x 7 periods). 
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Fig. 5. Kernel Circuit. Four bases of A, G, C, and T are implemented using 4-input 
AND gates. [C] in the figure means a part of inputs of the AND gate for C are changed 
to don’t cares. 



5 Performance Evaluation 

Figure 7 shows experimental results of reasoning accuracy as a function of the 
total number of bases adjacent to the GT. In the experiment, we used 9,900 test 
records that are not used to make kernels as mentioned above. The accuracy 
is saturated at 6 bases and the highest accuracy of 95.8 % is obtained at 6 
bases. This saturation around 6 bases means that the important rules, or base- 
arrangement patterns to control whether the GT is true boundary or not are 
embedded in the 6 bases (3 bases on each side of the GT). 

We measured the reasoning accuracy of the other living organisms using the 
human kernels. Results are listed in Table 2. High reasoning accuracy of 84.0 % 
was achieved in gorilla, while the reasoning accuracy considerably degreased in 
yeast and dictyostelium discoideum that is one of the most primitive organisms 
changing its form between animal and plant. Those experimental results mean 
that the kernels generated using human DNA preserve rules in the mammals, 
but the rules have been evolved and do not match the primitive organisms. 
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Answer i ^ 
Connection to PC 



Fig. 6. Block diagram of the ERHW. 



We also compared our result with those obtained using neural networks (Ta- 
ble 3) reported in 0. In the table, “Neural Network” represents a single feed- 
forward multi-layered neural network and “Multi-modal Neural Network” rep- 
resents multiple use of that network with a decision layer. We cannot compare 
the results each other directly because the data set used in our experiments was 
different from those used in [0. However, we can say that ERHW may have 
higher reasoning accuracy than a single neural network, and have almost the 
same reasoning accuracy as the multi-modal neural network. 

We used the Xilinx Foundation Logic Synthesizei0to synthesize the kernels, 
counters and the maximum detector. The ERHW designed using 1,100 DNA 
records requires 178,004 gates, and the prototype has enough capacity to imple- 
ment those circuits. 

Table 4 lists the measured reasoning time for one GT boundary. The reason- 
ing speed of the prototype is by 1,000 times or more faster than that of the PC 
(Pentium HI 1 GHz with 128 MB memory). The reason of this high speed pro- 
cessing in the ERHW in spite of low clock speed is that the circuits are designed 



^ http//:www. xilinx.co.jp/ 
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Fig. 7. Experimental results of reasoning accuracy as a function of the total number 
of bases adjacent to the GT. 

Table 2. Reasoning accuracy of some organisms using the human kernels. 





human 


gorilla 


dictyostelium 

discoideum 


yeast 


Prediction Accuracy 
Using Human Kernels 


95.8% 


84.0% 


48.0% 


12.0% 



directly from the DNA records, so that higher parallelism is achieved with less 
number of circuits. 



6 Conclusions 

The evolvable reasoning hardware (ERHW) was applied to the splicing site rea- 
soning that was one of the best approaches to the gene-finding. Performance 
was evaluated using GenBank database, and the ERHW showed high reason- 
ing accuracy of 95.8%. It also showed high reasoning speed of 280 ns that was 
faster than that of a personal computer (Pentium III IGHz) by 1,000 times with 
a small circuit size of 178,004 gates. The kernels generated from the human’s 
records also showed high reasoning accuracy to gorilla’s records, but not to other 
primitive organisms’ records. This result seems to indicate that some hereditary 
rules in the mammals are extracted in the generated human kernels. 
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Table 3. Comparison of reasoning accuracy with neural networks. 



ERHW 


Neural Network 


Multi-modal 
Neural Network 


95.8% 


83.4% 


95.0% 



Table 4. Comparison of reasoning-time. 



ERHW 


PC (Pentium HI 1 GHz) 


280.0 ns 


382 s 
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Abstract. In this paper, we propose an evolvable fuzzy system for ATM cell 
scheduling. When the scenarios of cell flows in an ATM network change dra- 
matically, traditional scheduling algorithms, first-in-first-out (FIFO) and static 
priority, which employ static switching scheme may see their efficiency dete- 
riorate. An alternative is to use dynamically weighted priority, which changes 
the priority based on the cell delay. Still, the algorithm per se is fixed and on 
the hardware level, nothing changes. Our goal in this paper is to achieve hard- 
ware adaptability and demonstrate its applicability in ATM switching. To do 
this, we proposed an evolvable fuzzy system [1]. With evolvable fuzzy system 
(EES), the system undergoes transformation in response to changes in the cell 
patterns in order to maintain operational efficiency. We envisage that such a 
system is potentially useful for demanding real-time dynamic control applica- 
tions. 



1 Introduction 



In recent years, ATM has become a universal network, because it can offer bandwidth 
support from telephony network to Internet. Due to the diversity of applications it can 
support, the bandwidth of ATM is critical. The efficiency of cell scheduling is there- 
fore very important. In order to solve the problem of cell scheduling, many algorithms 
have been designed to cater for the desired QoS (quality of service) requirement. The 
common switching schemes are FIFO, static priority (SPR) and DWPS (dynamically 
weighted priority scheduling) [2]. The main attributes for determining QoS are cell 
loss and cell delay [3]. FIFO is a very simple method to do cell scheduling. It is easy 
to implement in hardware, but is not very good in terms of QoS performance. Static 
priority is also a simple method to do cell scheduling. However, it tends to be unfairly 
biased towards Class 1 cell [1,2], which will always have a higher priority than other 
cell classes. DWPS is a significant improvement over the static SPR scheme. It ad- 
justs the priority according to the cell flow scenarios [2]. But the adaptation scheme 
tends to be simple and may not be very efficient if the cell flows change dramatically. 
DWPS is also more difficult to implement in hardware. 

Fuzzy logic is a powerful methodology in embedded control. It has been widely 
used in many real time applications since its inception. Cell flow scheduling is also a 
real time control problem, hence ideal for fuzzy control. But it can be tedious and 
complicated to manually formulate the fuzzy rule set for efficient cell scheduling in 
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all scenarios. In this respect, an evolvable fuzzy system is very appropriate. With 
EPS, an optimal fuzzy rule set can be consistently maintained. In many fuzzy sys- 
tems, rule set is manually derived based on intuition. This can be limited since the 
manually derived rule set may only be efficient for specific period of cell flows. 
Based on this premise, the principle behind EES is therefore to evolve the rule set 
accordingly, which in this case is the changing patterns of cell flows. The adaptation 
of the rule set is achieved through symbiotic hybridization of an evolutionary algo- 
rithm embedded with fuzzy inference mechanisms [1,4]. 

If the pattern of cell flows changes, the fuzzy rule set should be changed accord- 
ingly to maintain consistent performance. To do this, online evolution to adapt to the 
cell flows can be implemented. Ultimately, it is desirable to implement the system as 
a form of intrinsic evolvable fuzzy hardware [1]. In principle, for such a scheme to be 
workable, it is just as important to address the architectural and hardware implementa- 
tion issues pertaining to evolvable fuzzy system. 

In this article, we will present some preliminary simulation results to demonstrate 
the potential usefulness of the proposed EPS for ATM scheduling. The QoS perform- 
ance of the EES is compared with the other three schemes, namely FIPO, SPR and 
DWPS. 

This paper consists of 4 sections. In Section 2, we describe our proposed EES in 
the context of ATM cell-switching problem. Section 3, outlines the simulation meth- 
odology. Results of simulation and comparisons with three other scheduling schemes 
will be presented. Finally in Section 4, we present some concluding remarks and 
outline some aspects of our future work. 



2 Evolutionary Framework - ATM Cell- Scheduling 

2.1 Problem Descriptions 

The simplified block architecture of the ATM cell multiplexer is as shown in Fig.l. In 
the block diagram, BUFl and BUF2 refer to buffers while MP refers to the multiplex- 
ing unit. The fuzzy switching control block is the part of the hardware that handles 
cell scheduling. The control scheme is derived through some evolutionary mecha- 
nisms, which forms the backbone of the EES. For illustration, we classify the ATM 
services into two types. Class 1 and Class2. 

For ATM switching, Classl can be a form of CBR (Constant Bit Rate) traffic, rt- 
VBR (real-time Variable Bit Rate) or both [5]. Class2 traffic type may refer to nrt- 
VBR (non-real-time Variable Bit Rate), UBR (Unspecified Bit Rate) or ABR (Avail- 
able Bit Rate) [5]. Classl type is delay sensitive while Class2 is considered to be not 
sensitive to delay. These two cell sources must be multiplexed on the OUTPUT by the 
MP unit through time division. We assume that the capacity of the OUTPUT to be the 
maximum of the input channels. It is also assumed that the capacities of the input 
buffers are finite and fixed. 

To begin with, we define two symbols for the inputs, and c^. The symbol Cj re- 
fers to the status of Classl cell, which is a function of V, and V . V, is the current 
cell rate of Classl cell flow while V is the maximum cell rate of the line capacity. 
The symbol c, refers to buffer status of BUF2, which is a function of L, and L . L, is 
the number of empty units in BUF2 while is the length of Class2 cell buffer. For 
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Cj and c^, the memberships are characterized by the linguistic term set {very small, 
small, medium, large, very large}. The output of the MP unit OUTPUT is character- 
ized by the set {true, false}. It is clear that OUTPUT need not be fuzzy and hence 
characterized by two discrete values. Functionally, a true means that the MP unit 
allocates time packets to cater for the Class 1 cell flow in BUFl. A false implies 
switching reverting to cells in BUF2. 



A-— 

► 



ClassJ. 



Class2 



Fuzzy Switching 
Control 



BUFl 


-► 


— ^ — 
M 






P 


BUF2 


— ► 





tz. 



OUTPUT 



Fig. 1. Simplified multiplexer block diagram 



Based on the above characterization of the switching network, it is possible to de- 
fine the n-rule heuristics to control the switching behavior. With the fuzzy member- 
ships defined, one can rely on intuitive logic to define the necessary input-output 
mappings as shown in Fig. 2. The 25-rule system serves as the default ATM cell 
scheduling algorithm on system startup. We refer to these rules as the core rule set. It 
is noted that although our illustration shows an exhaustive rule set, actual systems 
may start off with a core rule set of n being less than 25. 



2.2 Genetic Coding 

As an illustration of the coding mechanism, consider the rule set in Table 1. The ge- 
netic code for the 25-rule system is represented as a genetic string structure 
“1222211122111121111211111”. The allelic code 1 and 2 correspond to the labels 
true and false respectively. The position of the gene in the string corresponds to a 
specific rule in Table 1, when interpreted accordingly in a row wise manner. 



2.3 Evolution of Rule Set 



As rationalized above, we can simplify the maximum rule set system with one that 
utilizes fewer rules. For example, 0220001200011000011000011 refers to a 10-rule 
fuzzy system. In the string, 0 implies that there is no rule defined for the specific 
inputs scenario. Using GA, an optimal fuzzy rule set for the cell multiplexer can be 
derived through learning from past data on ATM traffic flow. 
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Table 1. A 25-rule fuzzy system for ATM cell scheduling. 





very 

small 


small 


Medium 


large 


very 

large 


very small 


T 
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F 


small 
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T 
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F 


F 


Medium 


T 


T 


T 


T 


F 


Large 


T 


T 


T 


T 


F 


very large 


T 


T 


T 


T 


T 



T = true, F = false 



The most important component necessary for evolution is a mechanism to incorpo- 
rate the fitness function in order to control the evolution. Based on the problem speci- 
fications, the goal is to use evolvable fuzzy system to schedule ATM cells such that 
the QoS requirements are satisfied. The QoS considered parameters are cell delay and 
cell loss ratio. The overall fitness function can be defined as follows: 

^ ( 1 ) 

F = /c-(Aj] g(N(t),m(t))+0^\7i-72\) 

t=\ 



Fitness is evaluated based on the scheduling performance for T number of cells, 
and 72 the cumulative number of cell loss for Class 1 and Class2 respectively. 

/I is a weight parameter for the cell delay of Class 1 cell flow, ^is a weight parameter 
to emphasize the significance of cell loss in the fitness function. Two additional pa- 
rameters N(f) and m(f) are defined for the above fitness function. N(t) refers to the 
maximum permitted delay time for Class 1 cells, which can be determined. m(t) is the 
waiting time of the cells in BUFl before being sent. The function g(N(t),m(t)) which 
describes the measure of switching performance in terms of cell delay can be written 
as follows: 



g{N(t),m{t)) 



j N{t)-m{t) N(t)>m{t) 

- N{t)) X CO N(t) < m{t) 



( 2 ) 



In the function g(f, (ois the penalty factor, a constant to account for cells that exceed 
the maximum permitted delay. Typically, (O should be greater than 1, depending on 
the level of tolerance. In general, the overall performance of the switching system 
corresponds to the fitness value. The parameter is a very large constant. For this 
fitness function, the parameters (O, 8 and X can be tuned accordingly based on the 
system’s requirement. 



2.4 Evolution Scheme 

In order to satisfy the real time requirement and online evolution, we design the sys- 
tem architecture as in Fig. 2. 
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Fig. 2. Adaptation framework for EFS 

In this system, the training buffers TBl and TB2 are used to store Classl and 
Class2 cells respectively. The size of TB 1 and TB2, which corresponds to parameter 
T in Eq.l, is 2 or 3 times that of BUFl and BUF2. When either TBl or TB2 is full, 
the evolutionary process is triggered. Fitness evaluation is carried out by subjecting 
each chromosome to the scheduling model according to the cell flow stored in TBl 
and TB2. The purpose of the scheduling model is to emulate the function of the ATM 
network as in Fig.l. If a chromosome that corresponds to a system rule set is better 
than the working chromosome, the working chromosome is replaced immediately. 
Functionally, the scheduling model emulates the ATM network to derive the cell 
delay and cell loss parameters. These parameters enable the fitness value to be calcu- 
lated based on Fq.l. Basically, the evolution module serves to evolve the fuzzy rule 
set and it interacts with the scheduling model to evaluate the fitness of each evolved 
rule set. If evolution is triggered, it works in the background while the MP unit is in 
operation. 



3 Simulation 

The EFS for ATM cell scheduling proposed in this paper is simulated to study its QoS 
performance in terms of cell loss and cell delay. The EFS simulation is implemented 
in C-H-. Similarly, the FIFO, SPR and DWPS schemes are also simulated for per- 
formance comparison. 

In our system simulator, cell flow traffic is simulated by reading from two input 
file streams. Each input stream corresponds to one cell class. The simulator reads 
from the files and checks if the arrival time of each cell is less than or equal to the 
simulation time. If it is, the cell is stored in BUF# (BUFl or BUF2) and TB# (TBl or 
TB2 respectively). The simulation time step is set at 0.01 ps. 

The data format of each cell in the input files is as shown in Fig. 3. For Classl cell 
flow, the Max_delay or maximum cell delay allowed can be specified based on sys- 
tem requirement. It is common to assume Max_delay to be 0. However, in order to 
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eliminate the too good service problems [6], Max_delay can be given an appropriate 
value, which corresponds to the maximum delay that the receiver can tolerate. 
Bit_rate refers to the current bit rate. The Next_bit_rate is the bit rate of the next cell. 
The only difference between Class2 and Class 1 cell is that Max_delay for Class2 cell 
is not used because in the fitness function, cell delay for Class2 is not considered. 

Cell# Arrival_time Max_delay Bit_rate Next_ bit_rate 



Fig. 3. Cell format 




Fig. 4. Simulation flow diagram 
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The overall flow diagram of the sysfem simulator is as shown in Fig.4. T ^ and 
refer to the cell arrival time for Class 1 and Class2 respectively while tis the current 
running time of the simulation. For the EFS simulation, the parameters chosen are 
9=2 and A=4.5. The size of BUF# is set at 100 units while TB# is 3 times that of 
BUF#. 

A robusf scheduling algorifhm should perform well under various cell flow condi- 
tions [7], Two kinds of cell flows representative of most cell flow scenarios are simu- 
lated. After the simulation, we compare the results of our algorithm with the other 
three scheduling methods that are popular in ATM cell scheduling. In both scenarios, 
the simulation time is 5 seconds. 



Class 1 



4 . 






Class2 




Fig. 5. Two classes of cell flows 



3.1 Simulation Scenario 1 



In this part of the simulation, we generate the traffic cell flows as in Fig. 5. 

Class 1 is the CBR cell flow with cell rate of 155.52MHz. Class2 is VBR cell flow, 
also with a cell rate of 155.52MHz. The VBR specified has a 2ms ON period and a 
2ms OFF period. 

We compare the results of EFS scheduling with FIFO and dynamic priority cell 
scheduling. In this scenario, static priority scheduling can only send Class 1 cells, so it 
need not be considered. The following are our simulation results. The comparison 
between the three schemes in terms of cell loss and cell delay is presented in Fig. 6 
and Fig. 7. 

In Fig. 6, the difference in cell loss for Class 1 and Class2 for the three scheduling 
algorithms is not significant. The total cell losses for the three scheduling algorithms 
are about the same. 





Fig. 6. Class 1 cell loss and Class2 cell loss 




Evolvable Fuzzy System for ATM Cell Scheduling 215 






FIFO 

DWPS 

EFS 



H 

O 

w 



EFS 

DWPS 

nFO 



•Memo 7:aioi]e )X4oai 

Hme (us) 



4CCOIM sfludaa 



{■sucoo MC(i)D0 ](K>:<ns «Kmna <rtxo;o 

Ti»ne (US) 



Fig. 7. Class 1 and Class2 cell delay 

From Fig. 7, it can be seen that cell delay for Class 1 cell using evolvable fuzzy sys- 
tem is slightly smaller than the cell delay of the other two algorithms. The cell delay 
of Class2 cell flow using evolvable fuzzy system is greater than that of the other two 
algorithms. But in terms of cell loss and cell delay in ATM network, EFS is able to 
achieve less Class 1 cell delay and at the same time, achieve a balance of cell loss 
between Class 1 and Class2 cell flow. 



3.2 Simulation Scenario 2 



In this part, we generate cell flows as in Fig. 8. 



Class 1 



Class2 




► 



► 



Fig. 8. Two classes of cell flows 

Class 1 refers to CBR cell flow with a cell rate of lOOMHz. Class2 is VBR cell 
flow with unknown random cell rate. The minimum cell rate for VBR is 55.52MHz 
while the maximum is 155.52MHz. In this scenario, since the sum of the CBR and 
VBR cell rate is larger than the out channel’s capacity, cell loss is unavoidable. Hence 
it is very useful to choose the appropriate scheduling method according to the cell loss 
and cell delay QoS parameters. From a practical point of view, this scenario is more 
likely to occur compared to the first scenario. The figures (Fig. 9, Fig. 10) that follow 
show the simulation results of EFS compared to the other three scheduling schemes. 

In Fig.9, Class 1 cell loss for the EFS is not zero, unlike that of DWPS and SPR. It 
is smaller than Class 1 cell loss of FIFO. But in a way, EFS can maintain a desirable 
balance between Class 1 and Class2 cell flow. The sums of cell losses of the two cell 
classes for the four algorithms are the same. Also, it is noticeable that Class 1 cell 
delay for EFS is much smaller than that of FIFO and DWPS. Although Class 1 cell 
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delay of EFS is larger than SPR, but in terms of keeping a balance between Class 1 
cell and Class2 cell, EFS is much better than SPR. Considering the cell loss and cell 
delay on the whole, EFS performs better than the other three. 




Timelusl Titre iusj 



Fig. 9. Class 1 and Class2 cell loss 
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Fig. 10. Class 1 and Class2 cell delay 



It should also be noted that for EFS, it is convenient and easy to specify the relative 
importance of cell loss and cell delay. This can be achieved by changing the values of 
0 and A. The values of these parameters can also be modified online, which is a spe- 
cial advantage that our proposed EFS framework enjoys. 



4 Conclusion 



From the simulation results presented in this paper, the overall performance of the 
proposed EFS is very encouraging. Our goal is to realize eventually the complete 
ATM cell scheduler on hardware, including the evolutionary mechanisms [1]. The 
simulation of the proposed framework has demonstrated the effectiveness and effi- 
ciency of the EFS scheduling scheme. There are essentially two types of evolvable 
hardware, extrinsic evolvable hardware and intrinsic evolvable hardware [8,13,14, 
15]. From an implementation point of view, it is desirable to realize the evolutionary 
process within the hardware framework of the scheduler. This notion suggests the 
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implementation of a form of intrinsic evolvable fuzzy hardware for dynamic real-time 
applications. 

The results presented in this paper serve to demonstrate the viability of the pro- 
posed scheme. Further work on the evolutionary training algorithm needs to be car- 
ried out in order to improve the overall QoS performance. Detailed study on the archi- 
tectural framework is required to derive a practical and efficient mode of intrinsically 
evolvable fuzzy hardware system. Our work discussed in this paper forms the basic 
framework of embedded fuzzy control and embedded evolutionary learning, which 
ultimately is realizable with the current system-on-chip (SoC) technology. Further 
ongoing work in this area will consider the direct implementation of the evolvable 
fuzzy hardware. 
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Abstract. In this paper, we propose the use of Information Theory 
as the basis of the fitness function for Boolean circuit design. Boolean 
functions are implemented by means of multiplexers and genetic pro- 
gramming. Entropy based measures such as Mutual Information and 
Conditional Entropy are investigated as tools for similarity measures 
between circuits. A comparison of synthesized (through evolution) and 
minimized circuits through other methods denotes the advantages of the 
Information-Theoretical approach. 



1 Introduction 

Entropy is a measure of disorder and the basis of Information Theory (IT) 
E- Shannon [01 suggested the use of information entropy as a measure of 
the amount of information contained within a message. Thus, entropy tells us 
that there is a limit in the amount of information that can be removed from 
a random process without having any information loss. For instance, in theory, 
music can be compressed (in a lossless form) and reduced up to its entropy limit. 
Further reduction is only possible at the expense of information lost. 

The IDS algorithm for the construction of classifiers (based on decision trees) 
is probably the best-known computer science representative that relies on en- 
tropy measures m- For IDS, an attribute is more important for concept classi- 
fication if it provides greater “information gain” than the others. 

IT was first used by Hartamann et al. 0 to transform decision tables into 
decision trees. Boolean function minimization through IT techniques has been 
approached by several authors These methods are top-down, thus, the 

design strategy follows after a set of axioms in the knowledge domain. Luba 
et al. m address the synthesis of logic functions using a genetic algorithm 
and a fitness function based on conditional entropy. Their system needs heavy 
preprocessing of the search space (Shannon’s expansion is applied to the target 
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Boolean function as to find subexpressions whose purpose is to guide the genetic 
search. Only after that, the genetic algorithm is started). 

In this paper we use multiplexers and genetic programming (GP) for the 
synthesis of Boolean functions. We propose a fitness function driven by the 
Normalized Mutual Information between the target function and the evolved 
function. Our system works exclusively in a bottom-up fashion, thus no prepro- 
cessing of the search space is needed. The paper is organized as follows. Section 0 
describes the problem statement. Section 0 introduces basic concepts of infor- 
mation theory used throughout the article. In Section 0we show how entropy 
based methods will prevent convergence of any evolutionary method if not used 
correctly. In Section 0we propose three fitness function based on normalized mu- 
tual information and conditional entropy. Section Qis devoted to experiments, 
and we finish with conclusions and final remarks in Section 0 



2 Problem Statement 

The design problem is the following: find the smallest circuit that implements a 
Boolean function specified by its truth table mm- The design metric adopted 
in this case is the number of components in a 100% functional circuit. The 
process works at “gate-level” and the only component replicated is the binary 
multiplexer. A binary multiplexers implements the Boolean function f = ax + 
a'y, where a is the control and {x,y} the input signals. The use of multiplexers is a 
sound approach because: 1) they are universal generators of Boolean functions, 
and 2) any circuit in the population is the Shannon expansion of a Boolean 
function. The expansion takes the form of and-or sum of products (SOP) which 
are easily represented as decision trees. Therefore, circuits are encoded as trees 
and the approach follows the representation adopted by Genetic Programming. 
Leaves of the tree are only Is and Os (as in a decision tree), and the nodes are 
the variables of the Boolean function. Every variable of a node takes the place 
of the “pivot” variable used in the expansion. 

Definition 1. Boolean Residue The residue of a Boolean function f{x\,X 2 , 
. . . ,Xn) with respect to a variable Xj is the value of the function for a specific 
value of Xj. It is denoted by for Xj = 1 and by for xj = 0. 

f = XjfUj +Xjf\a,^ (1) 

The pivot variable is Xj. For instance, for the function /(a, &, c) = a'b'c + 
a'bc' + ab'c', the residue of the expansion over variable a is: 

/(a, &, c) = a' F{a = 0) -I- aF{a = 1) = a'{b'c + be') + a{b'c') 

Therefore, pivot variable a takes the control of the multiplexer and the two 
residues form the inputs, as shown in Figure 0 

Further expansion of the functions at the mux inputs yields the complete 
tree of muxes implementing the target function. 
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F=AX + A’Y 




Y X 



F=A(B’C’) + A’(B’C+BC) 




a) 



b) 



Fig. 1. a) the binary multiplexer, b) the Shannon expansion using a multiplexer 



3 Basic Concepts of IT 

Uncertainty and its measure provide the basis for developing ideas about Infor- 
mation Theory 0. The most commonly used measure of information is Shan- 
non’s entropy. 

Definition 2. Entropy The average information supplied by a set of k symbols 
whose probabilities are given by {pi,p 2 , ■ ■ ■ ,Pk}, can be expressed as, 

k 

H{pi,p2,...,Pk) = -'^Pklog2Pk ( 2 ) 

s=l 

The information shared between the transmitter and the receiver at either end 
of the communication channel is estimated by its Mutual Information, 

MI{T; R) = H(T) + H{R) - H(T, R) = H{T) - H{T\R) (3) 

The conditional entropy H{T\R) can be calculated through the joint proba- 
bility, as follows: 



H{T\R) = EE p{tirj)log2 ^^^"^^^ (4) 

i—l j—1 

An alternative expression of mutual information is 

M/(T; R) = EE p{t,r)log 2 (5) 

t&Tr&R ' 

Mutual information. Equation Q is the difference between the marginal en- 
tropies H(T) -F H{R), and the joint entropy H{T,R). We can explain it as a 
measure of the amount of information one random variable contains about an- 
other random variable, thus it is the reduction in the uncertainty of one random 
variable due to the knowledge of the other [Q. 
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Table 1. Function F = AB + BC nsed to compnte MI(F;C) 



A 


B 


c 


F=AB+BC 


0 


0 


0 


0 


0 


0 


1 


0 


0 


1 


0 


0 


0 


1 


1 


1 


1 


0 


0 


0 


1 


0 


1 


0 


1 


1 


0 


1 


1 


1 


1 


1 



Conditional entropy is used in top-down circuit minimization methods [0, 
and also in evolutionary approaches 

Mutual information is not an invariant measure between random variables 
because it contains the marginal entropies. Normalized Mutual Information is a 
better measure of the “prediction” that one variable can do about the other m- 



NMI{T- R) 



H{T) + H{R) 



( 6 ) 



Normalized Mutual Information has been used in image registration with 
great success m- 



Example: We illustrate these concepts by computing the Mutual Information 
between two Boolean vectors F and C, shown in Table 0 Variable C is an 
argument of the Boolean function F{A^ B, C) = AB + BC. We aim to estimate 
the description the variable C can do about variable F, that is, MI{F; C). 

We use Equations 0andg]to calculate MI{F-,C). Thus, we need the entropy 
H{F) and the conditional entropy H{F\C). 

Entropy requires the discrete probabilities p{F = 0) and p{F = 1) which we 
find by counting their occurrences 



H{F) 



5, 5 3, 3\ 



0.9544 



The conditional entropy. Equation 0 uses the joint probability p{fi,Cj), which 
can be estimated through conditional probability, as follows: p{f, c) = p{f)p{c\f). 
Since either vector F and C has two possible values, the discrete joint distribu- 
tion has four entries, as follows: 



p{F = 0,C = 0)=p{f = 0)p{c = 01/ = 0) 



p{F = 0, C7 = 1) = p(/ = 0)p(c = 1|/ = 0) 



p{F =l,C = 0) =p{f = l)p{c = 0\f = 1) 



p{F = 1, C = 1) = p(/ = l)p(c = 1|/ = 1) 



5 3 

- X - 

8 5 

5 2 

- X - 

8 5 

3 1 

- X - 

8 3 

3 2 



0.375 

0.25 

0.125 

0.25 



X 
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Now we can compute the conditional entropy by using Equation 0 The dou- 
ble summation produces four terms (since n = 2): 



H{F\C) = 0.9056 

Therefore, MI{F; C) = H{F) - H{F\C) = 0.9544 - 0.9056 = 0.0488. 

4 Entropy and Circuits 

Entropy has to be carefuly applied to the synthesis of Boolean functions. Let’s 
assume any two Boolean functions, FI and F2, and a third F3 which is the 
one’s complement of F2, then F3 yf F2. 



The implications for Evolutionary Computation are important since careless 
use of mutual information can nullify the system’s convergence. Assume the 
target Boolean function is T, then MI{T, F2) = MI{T, F3), but only one of 
the circuits implementing F2 and F3 is close to the solution since their Boolean 
functions are complementary. A fitness function based on mutual information will 
reward both circuits with the same value, but one is better than the other. Things 
could worsen as evolution progresses because mutual information increases when 
the circuits get closer to the solution, but in fact, two complementary circuits 
are then given larger rewards. The scenario is one in which the population is 
driven by two equally strong attractors, hence convergence is never reached. 

The fitness function of that scenario is as follows. Assume T is the target 
Boolean function (must be seen as a truth table), and C is output of any cir- 
cuit in the population. Fitness function is either the maximization of mutual 
information or minimization of the conditional entropy term. This is. 



The entropy term H{T) is constant since this is the expected target vector. 
Therefore, instead of maximizing mutual information the fitness function can 
minimize the conditional entropy. 



H{F\C) = - ( 




H{F2) = H{F3) 



Also, Mutual Information shows a similar behavior. 



M/(F1, F2) == MJ(F1, F3) 



badfitness functional = MI{T,C) = H{T) — F[{T\C) 



badfitnessfunction^2 = H{T) — H{T\C) 
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5 Fitness Function 

Based on Normalized Mutual Information 



So far, we have described the scenario where the population is driven by a 
fitness function based on the sole mutual information. We now propose two new 
fitness functions based on entropy. Let’s assume a target Boolean function of 
m atributes T(Ai, A 2 , . . . , Am), and the circuit Boolean function C of the same 
size. We propose and report experiments using the two following fitness functions 
(higher fitness means that a better solution has been found). 



fitness = (Length{T) — Hamming{T, C)) x NMI{T, C) (7) 



m 

fitnessl = 

i=l 



fitness 

NMI{Ai,C) 



fitness2 = ^ fitness x NMI{Ai, C) 

i=l 

fitnessS = (Length{T) — Hamming{T,C)) x (10 — H(T\C)) 



( 8 ) 

(9) 

( 10 ) 



Fitnessl, Equation Q is driven by NMI{T,C) and adjusted by the factor 
Length(T) — H ammingiT , C) . This factor tends to zero when T and C are 
far in Hamming distance, and tends to Length{T) when T and C are close in 
Hamming distance. The effect of the term is to give the correct rewarding of the 
NMI to a circuit C close to T. Equation 0is designed to remove the convergence 
problems described in the previous section. Fitnessl and Fitness2, Equations 0 
and 0 combine the NMI of T and C with NMI of C and the attributes A^ of 
the target function. Thus, fitnessl and fitness2 pretend to use more information 
available in the truth table in order to guide the search. FitnessS is based on 
conditional entropy and it uses the mentioned factor to supress the reproduction 
of undesirable trees. Since conditional entropy has to be minimized we use the 
factor 10 — H{T\C) in order to miximize fitness. 



6 Experiments 

In the following experiments we find and contrast the convergence of our GP 
system for the three fitness functions defined above. 

6.1 Experiment 1 

Here we design the following (simple) Boolean function: 

F{a, h, c, d) = ^(0, 1, 2, 3, 4, 6, 8, 9, 12) = 1 

We use a population size of 300 individuals, Pc = 0.35, Pm = 0.65, and we 
run our algorithm for 100 generations. The optimal solution has 6 nodes, thus 



224 



Arturo Hernandez Aguirre et al. 



we find the generation in which the first 100% functional solution appears, and 
the generation number where the optimal is found. The problem was solved 20 
times for each fitness function. 

Table 0shows the results of these experiments. 

Table 2. Generation number where the first 100% functional circuit is found, and the 
generation where the optimum is found, for three fitness functions 



Event 


Gen. at htnessl 


Gen. at 6tness2 


Gen. at fitnessS 


100% Functional 


13 ±5 


14 ± 7 


18 ±6 


Optimum Solution 


30 ±7 


30± 10 


40 ±20 



6.2 Experiment 2 

The next test function is: 

F{a, b, c, d, e, /) = ah+ cd + ef 

In this case, we use a population size of 600 individuals, Pc = 0.35, Pm = 0.65, 
and we stop after 200 generations. The optimal solutions has 14 nodes. Each 
problem was solved 20 times for each fitnesss function. 

Table EJshows the results of these experiments. 

Table 3. Generation number where the first 100% functional circuit is found, and the 
generation where the optimum is found, for three fitness functions 



Event 


Gen. at htnessl 


Gen. at 6tness2 


Gen. at fitnessS 


100% Functional 
Optimum Solution 


39± 12 
160 ± 15 


40± 11 
167 ± 15 


50± 12 
170 ± 20 



6.3 Experiment 3 

The last problem is related to partially specified Boolean functions jp. With this 
experiment we address the ability of the system to design Boolean functions with 
“large” number of arguments and specific topology. For this, we have designed a 
synthetic problem were the topology is preserved when the number of variables 
increases. 

Boolean functions with 2k variables are implemented with (2 * 2k) — 1 binary 
muxes if the truth table is specified as shown in Table El 

We ran experiments for k = 2,3,4, thus 4,8, and 16 variables and we have 
contrasted these results with the best known solutions for this problem (reported 
in [P). For completeness, all previous results are reported together with the 
results of the new experiments in Table Q where we use the three fitness functions 
(Equations tSF)ll 1)1 . 
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Table 4. Partially specified Boolean function of Example 3 needs (2 * 2k) — 1 



ABCD 


F(ABCD) 


0 


0 


0 


0 


0 


0 


0 


0 


1 


1 


0 


0 


1 


0 


1 


0 


1 


0 


0 


1 


1 


0 


0 


0 


1 


0 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


1 


0 



Table 5. Generation nnmber where the first 100% functional circuit is found, and the 
generation where the optimum is found, for three fitness functions 



k 


variables 


size 


Avg(previous) 


Avg(fitnessl) 


Avg(fitness2) 


Avg(fitness3) 


2 


4 


7 


60 


60 


60 


60 


3 


8 


15 


200 


190 


195 


194 


4 


16 


31 


700 


740 


731 


748 


5 


32 


63 


2000 


2150 


2138 


2150 



All parameters are kept with no change for similar experiments, average is 
computed for 20 runs. The previous experiments use a fitness function based 
on Hamming distance between the current solution of an individual and the 
target solution of the truth table. One important difference is the percentage of 
correct solution found. Previously we reported that in 90% of the runs we found 
the solution (for the case of fitness based on Hamming distance). For the three 
fitness functions based on entropy we found the solution in 99% of the runs. 



7 Final Remarks and Conclusions 

A fitness function using only conditional entropy was tested with no success at 
all. We believe this is a clear indication of a fitness function that does not take 
into account the properties of entropy. In general, the three fitness functions 
work quite well, all of them found the optimum in most cases, thus comparable 
to other fitness functions based on Hamming distances. Entropy based measures 
seem hard to adapt to Evolutionary Computation since the entropy of evolu- 
tionary systems is not well understood (after “creationists” would say evolution 
is imposible because entropy would not allow the development of a system) . The 
final remark is that the convergence time and the quality of results produced is 
comparable with the many experiments we have done before in this area. Based 
on the results shown in Tables Qand 0we would give some advantage to nor- 
malized mutual information over simple mutual information because it is less 
biased. Results from Table Q could imply that mutual information is able to 
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capture “that” relationship between the data that the sole Hamming distance 
can not convey to the population. 
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Abstract. Evolvable Hardware (EHW) has been proposed as a new 
method for evolving circuits automatically. One of the problems appear- 
ing is that only circuits of limited size are evolvable. In this paper it is 
shown that by applying an approach where the training set as well as 
each training vector is partitioned, large combinational circuits can be 
evolved. By applying the proposed scheme, it is shown that it is possible 
to evolve multiplier circuits larger then those evolved earlier. 



1 Introduction 

For the recent years, evolvable hardware (EHW) has become an important 
scheme for automatic circuit design. However, still there lack schemes to over- 
come the limitation in the chromosome string length E0. A long string is re- 
quired for representing a complex system. However, a larger number of gener- 
ations are required by genetic algorithms (GA) as the string length increases. 
This often makes the search space too large and explains why only small circuits 
have been evolvable so far. Thus, work has been undertaken trying to diminish 
this limitation. Various experiments on speeding up the GA computation have 
been undertaken [0. The schemes involve fitness computation in parallel or a 
partitioned population evolved in parallel. Experiments are focussed on speed- 
ing up the GA computation, rather than dividing the application into subtasks. 
This approach assumes that GA finds a solution if it is allowed to compute 
enough generations. When small applications require weeks of evolution time, 
there would probably be strict limitations on the systems evolvable even by 
parallel GA. 

Other approaches to the problem have used variable length chromosomes |^. 
Another option, called function level evolution, is to apply building blocks more 
complex than digital gates |0. Most work is based on fixed functions. However, 
there has been work in Genetic Programming for evolving the functions - called 
Automatically Defined Functions (ADF) |0. 

An improvement of artificial evolution - called co-evolution, has been pro- 
posed [0. In co-evolution, a part of the data which defines the problem co-evolves 
simultaneously with a population of individuals solving the problem. This could 
lead to a solution with a better generalization than a solution based only on 
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the initial data. A variant of co-evolution - called cooperative co-evolutionary 
algorithms, has been proposed by De Jong and Potter |I8I9I . It consists of parallel 
evolution of sub-structures which interact to perform more complex higher level 
structures. Complete solutions are obtained by assembling representatives from 
each group of sub-structures together. In that way, the fitness measure can be 
computed for the top level system. However, by testing a number of individuals 
from each sub-structure population, the fitness of individuals in a sub-population 
can be sorted according to their performance in the top-level system. Thus, no 
explicit local fitness measure for the sub-populations are applied in this approach. 
However, a mechanism is provided for initially seeding each GA population with 
user-supplied rules. Darwen and Yao have proposed a co-evolution scheme where 
the subpopulations are divided without human intervention mil- 

incremental evolution for EHW was first introduced in HH for a character 
recognition system. The approach is a divide-and-conquer on the evolution of 
the EHW system, and thus, named increased complexity evolution. It consists 
of a division of the problem domain together with incremental evolution of the 
hardware system. Evolution is first undertaken individually on a set of basic 
units. The evolved units are the building blocks used in further evolution of a 
larger and more complex system. The benefits of applying this scheme is both a 
simpler and smaller search space compared to conducting evolution in one single 
run. The goal is to develop a scheme that could evolve systems for complex 
real-world applications. Earlier work has shown that the approach is successful 
for character classification and prosthetic hand control There has been 

undertaken work on decomposition of logic functions by using evolution El- 
Results on evolving a 7-input and 10-output logic function show that such an 
approach is beneficial. 

The largest correctly working multiplier circuit evolved - the author is aware 
of, is evolved by Vassilev et al El It is a 4 X Tbit multiplier evolved in a single 
run (8 outputs) by gates as building blocks. In this paper, the multiplier will 
not be evolved in a single run but rather be evolved as separate subsystems 
that together perform a correct multiplication. Normally, evolution is used to 
evolve a circuit with a smallest possible number of gates. In this work, we rather 
concentrate on reducing the evolution time at the same time as being able to 
solve problems not evolvable in a single run. This is motivated by the fact that 
the number of transistors becoming available in circuits increases according to 
Moores Law. Thus, this approach is based on “wasting” transistors to faster 
evolve a circuit or evolve circuits than are not evolvable in a single run. 

The experiments are undertaken for evolving a 5 x 5-bit multiplier circuit. 
There is a substantial increase in complexity between a four-by-four bit mul- 
tiplier (256 lines in the truth table) compared to a five- by- five bit multiplier 
(1024 lines in the truth table). Even though the experiments are for evolving 
multiplier circuits, the principles are valid for any combinational circuit defined 
by a complete truth table. 

The next section introduces the concepts of the multiplier evolution. Results 
of experiments are reported in Section 3 with conclusions in Section 4. 



230 



Jim Torresen 



2 Multiplier Evolution by Data Partitioning 

In this section, the increased complexity evolution is applied to evolve a large 
multiplier circuit. 



2.1 Approaches to Increased Complexity Evolution 

Several alternative schemes on how to apply this method are present: 

— Partitioned training vector. A first approach to incremental evolution 
is by partitioning each training vector. For evolving a truth table - i.e. like 
those used in digital design, each output could be evolved separately. In this 
method, the fitness function is given explicitly as a subset of the complete 
fitness function. 

— Partitioned training set. A second approach is to divide the training set 
into several subsets. For evolving a truth table, this would correspond to 
distributing the rows into subsets, which are evolved separately [E]. This 
would be similar to the way humans learns: Learning to walk and learning to 
talk are two different learning tasks. The fitness function would have to be 
designed for each task individually and used together with a global fitness 
function. Later in this paper it is shown how this can be done in digital 
design. 

In this paper, these two approaches are combined. 



Table 1. The truth table for multiplying two by two bits. 



X * Y = Z 

00 * 00 = 0000 
00 * 01 = 0000 
00 * 10 = 0000 
00 * 11 = 0000 
01 * 00 = 0000 
01 * 01 = 0001 
01 * 10 = 0010 
01 * 11 = 0011 
10 * 00 = 0000 
10 * 01 = 0010 
10 * 10 = 0100 
10 * 11 = 0110 
11 * 00 = 0000 
11 * 01 = 0011 
11 * 10 = 0110 
11 * 11 = 1001 



Table Qshows the truth table for a two by two bit multiplier circuit. In the 
context of evolving multiplier circuits, the training set consists of all possible 
permutations of the inputs. Evolving this circuit in a single run requires a circuit 
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with four inputs and four outputs. If we instead would like to apply the increased 
complexity evolution we could have a partitioning like the one given in Table 0 
This is for illustration only since there would be no problem in evolving such 
a small truth table in a single run. In this case, we would evolve a separate 
circuit for each of the four outputs - partitioned training vector (PTV). Further, 
partitioned training set (PTS) is also applied and divides the training set into 
two parts. This means that we end up with evolving eight subsystems in total 
(PTV equal to four multiplied by PTS equal to two) . The most significant input 
bit (a;i) is used to distinguish between the two training set partitions. Thus, 
the final system can be assembled together - as seen in Fig. 0 to perform a 
correct multiplier function. This requires during evolution that the truth table 
vectors are assigned into two groups each applied to a evolve its separate groups 
of subsystems (A and B) as the horisontal line (and x\) in Table 0 indicates. 
Within each such group there is no demand for sorting the vectors. 

Table 2. The truth table for multiplying two by two bits. 



X\Xo 
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yiyo 


23 


22 
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20 


00 
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00 
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0 
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00 


* 


01 


0 


0 


0 


0 


00 
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10 
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00 
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11 
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01 


* 


00 
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01 
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01 
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1 


01 
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10 
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01 
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11 
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10 
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00 
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10 
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10 
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10 
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11 
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00 
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11 
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01 
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11 
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10 
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11 
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11 
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1 



The amount of hardware needed would be larger than that needed for evolv- 
ing one system in a single run. However, the important issue - as will be shown 
in this paper, is the possibility of evolving circuits larger than those evolved by 
a single run evolution. 

The number of generations required to evolve a given subsystem would be 
highly dependent on the training vectors for the given training set partition. 
Thus, to minimize both the evolution time and the amount of logic gates, an 
algorithm should be applied to find both the partitioning size of the training 
set as well as the partitioning of each training vector. The best way this can 
be undertaken is probably by starting with a fine grain partitioning to be able 
to find a working system - even though the amount of logic would be large. 
Then, the training vectors for consecutive subsystems that was easily evolved 
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Fig. 1. An assembly of subsystems. 



can be merged and applied to evolve one new subsystem that can substitute 
the group of subsystems earlier evolved. This, will reduce the hardware size. 
Another optimizing feature to be included is to look into the training set from 
the beginning to make dedicated partitions. That is, if a set of consecutive input 
vectors result in the same output value, there is no need to actually evolve a 
subsystem for this. In the following experiments we apply fixed partitioning sizes, 
however results will probably indicate that it will be helpful with an adaptive 
partitioning. 



2.2 Target Hardware 

Each subsystem evolved consists of a fixed-size array of logic gates. The array 
consists of n gate layers from input to output as seen in Fig. Q Each logic 
gate (LG) is either an AND or an XOR gate. Each gate’s two inputs in layer I is 
connected to the outputs of two gates in layer l — l. In the following experiments, 
the array consists of 8 layers each consisting of 16 gates. A 5 x 5-bit multiplier 
consists of 10 outputs which we are going to evolve by partitioning as described 
in the example above. Thus, only one gate is used in the layer 8 of the gate array. 
This results in a maximum number of gates for each subsystem: Nss= 4-16-1- 
84-4-1-2-1-1 = 79 gates. 
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Inputs 




Outputs 



Fig. 2. The architecture of a gate array subsystem. 



The function of each gate and its two inputs are determined by evolution. 
The encoding of each logic gate in the chromosome string is as follows: 



Input 1 (4 bit) Input 2 (4 bit) Function (I bit) 



For the given array the chromosome string length becomes 1017 bit long which 
earlier experiments have shown to be appropriate. 



2.3 GA Parameters and Fitness Function 

Various experiments were undertaken to find appropriate GA parameters. The 
ones that seemed to give the best results were selected and fixed for all the 
experiments. This was necessary due to the large number of experiments that 
would have been required if GA parameters should be able vary through all the 
experiments. The preliminary experiments indicated that the parameter setting 
was not a major critical issue. 

The simple GA style - given by Goldberg [II YJ . was applied for the evolution 
with a population size of 50. For each new generation an entirely new population 
of individuals is generated. Elitism is used, thus, the best individuals from each 
generation are carried over to the next generation. The (single point) crossover 
rate is 0.5, thus the cloning rate is 0.5. Roulette wheel selection scheme is applied. 
The mutation rate - the probability of bit inversion for each bit in the binary 
chromosome string, is 0.005. 

The fitness function is computed in the following way: 

r 0 if y yf d 

F = ^ ^ X where a;=<lify = d = 0 (1) 

vec outp (^2ify = d= l 

For each output the computed output y is compared to the target d. If these 
equal and the value is equal to zero then 1 is added to the fitness function. On the 
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other hand, if they equal and the value is equal to one, 2 is added. In this way, 
an emphasize is given to the outputs being one. This has shown to be important 
for getting faster evolution of well performing circuits. The function sum these 
values for the assigned outputs (outp) for the assigned truth table vectors (vec) . 

The proposed architecture fits into most FPGAs (Field Programmable Gate 
Arrays). The evolution is undertaken off-line using software simulation. However, 
since no feed-back connections are used and the number of gates between the 
input and output is limited, the real performance should equal the simulation. 
Any spikes could be removed using registers in the circuit. 

Due to a large number of experiment (some requiring many generations to 
evolve) there has so far only been time to evolve one run of each reported ex- 
periment. 

3 Results 

This section reports the experiments undertaken to evolve multiplier circuits in 
an efficient way. 




123456789 10 

Output bit number 



Fig. 3. Results of evolving one output at a time with 32 training set partitions (each 
column sums the number of generations required to evolve each output bit). 



The experiments are based on using fixed partitioning sizes of the training 
set as well as evolving for one output bit at a time. Fig. Qshows the results of 
evolving with 32 training set partitions. That correspond to 32 vectors from the 
truth table being used for evolving each subsystem. Each stacked column is the 
sum of the number of generations used for evolving subsystems for each of the 
32 partitions for one output bit. Output bit number 1 is the most significant 
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bit while bit number 10 is the least significant bit. For every partition it was 
possible to find a correctly working subsystem. This was by a singleQrun of each 
subsystem evolution, rather than a selection among multiple runs. Thus, the 
experiment has shown that by the given scheme it is possible to evolve multipliers 
larger that those reported earlier. There should be no limit in applying this 
approach for evolving even larger multiplier circuits. However, the amount of 
hardware will also probably then have to be substantially increased. 

It is most difficult to evolve the five most significant bits with bit 4 requiring 
the largest number of generations (close to 1,000,000 generations). Bit 10 is the 
easiest to evolve and requires only 102 generations in total. This latter case would 
be a waste of gate resources since it (in another experiment) was evolved in a 
single run in 71 generations. That is, this system was evolved without training 
set partitions at all. The results indicate the need for an adaptive size in training 
set partitioning. 
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Fig. 4. Results of evolving one output at a time with 16 training set partitions (each 
column sums the number of generations required to evolve each output bit). 



To show how the “difficult” parts of the truth table become harder to evolve 
with a smaller number of training set partitions, an experiment with 16 training 
set partitions was conducted. As seen in Fig. ^ output bit 3, 4 and 5 now require 
close to 3 million generations to evolve. Moreover, there was four instances (one 

^ With one exception - that reached the maximum 200,000 generations. It was easily 
evolved in a second run. 
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for each output bit 3, 4, 5 and 6) where no correctly working circuit was found 
in 1 million generations. Thus, here we see the need for small partitioning sizes 
to be able to find a correctly working circuit. 

Computing the total number of generations results in 2.2 million for 32 par- 
titions and 13.7 million for 16 partitions, respectively. Thus, about six times 
less number of generations are required with the cost of twice the circuits size. 
However, the important issue is that, applying the larger number of partitions 
is needed to find a correctly working circuit. The results correspond well with 
the earlier experiments for other applications mentioned in the introduction. 

Even though it has been shown successful to evolve a large multiplier circuit, 
a 5x5 multiplier circuit is still much smaller than what can be designed in the 
traditional way. Thus, future work should concentrate on researching extension 
of this and other techniques for achieving automatic design of even larger logic 
circuits. 

4 Conclusions 

This paper has presented how incremental evolution can be applied for evolving 
multiplier circuits. The scheme is focused on evolution time and evolvability 
rather than minimizing hardware. Experiments verify that this is beneficial for 
solving more complex problems. A 5x5 multiplier circuit was evolved which is 
larger than any other reported evolved multiplier circuit. 
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Abstract. The evolution of digital circuits performing built-in self-test 
behaviour is attempted in simulation for a one bit adder and a two bit 
multiplier. Promising results show evolved designs can perform a better 
diagnosis using less resources than hand-designed equivalents. Future 
extensions of the approach could allow the self-diagnosis of analog circuits 
under failure and abnormal operating conditions. 



Motivation 

Self-diagnosis is important especially in mission critical systems exposed to radi- 
ation. Built-in self-test (BIST) is widely used yet commonly requires more than 
100% overhead as in voting based systems [1, 14, 17] or off-line testing [23, 32]. 

Evolutionary methods [6, 9, 13] applied to hardware have produced circuits 
comparable to those designed by experts [12, 18, 20, 30] and also unconventional 
circuits [15, 26] in which hardware resources are used extremely efficiently. More- 
over, many evolved systems in nature exhibit self-diagnostics such as the lym- 
phatic system [2]. 

All this leads to the possibility that evolutionary methods could explore areas 
of design space which reuse hardware components so that they contribute both 
to the circuit’s main functionality and its BIST, leading to a low overhead on-line 
solution. 

1 Introduction 

Traditional approaches to BIST use the Test Pattern Generation (TPG) - De- 
sign Under Test (DUT) - Test Response Evaluation (TRE) model, with the first 
and last being implemented using Linear Feedback Shift Registers (LFSR) [5]. 
Variations exist such as hierarchic, test-per-scan, BILBO (built-in logic block 
observer), PRPG-MISR (pseudo random pattern generator - multi input sig- 
nature register), circular BIST, and Reconfigurable Matrix Based Built-In Test 
Processor (RMBITP) [7,23,32]. Even though techniques such as RMBITP are 
successful at providing BIST for designs as large as 5 million gates with only 
around 11% overhead, they all suffer from two main disadvantages. The first is 
that they require the circuit’s operation to go off-line periodically to feed in the 
test patterns. The second is a bootstrapping problem: if the testing logic fails 
we will never know if the rest of the circuit is functioning properly. 



A.M. Tyrrell, P.C. Haddow, and J. Torresen (Eds.): ICES 2003, LNCS 2606, pp. 238—248, 2003. 
(c) Springer- Verlag Berlin Heidelberg 2003 
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Voting systems solve both these problems because on one hand faults are 
detected immediately during on-line operation and on the other hand the testing 
logic is usually small - all at the expense of complete redundant copies of the 
main circuit. Redundancy with on-line checking can be at the level of cells in 
a VLSI array [14, 17] or at the level of circuit modules, which could even be 
diverse designs of the same module, failing in different ways [1]. These, and other 
techniques for on-line diagnosis [3,4,21], share the problem that the benefits of 
self-checking must outweigh the higher fault rate due to increased silicon area. 

There are examples in the evolutionary electronics literature of designs that 
operate in surprising or intricate ways [8, 12, 16, 18, 19, 22, 24]. This paper makes 
a first attempt to apply this creativity to the design of circuits having BIST. 
To establish a proof of principle two simple digital design problems were chosen, 
which may illuminate the following issues: 

1. Can evolutionary search reach solutions to the BIST problem? 

2. Do these solutions reuse logic for the main task and BIST? 

3. Are these solutions competitive with conventionally designed ones? 

4. Are there any principles of operation we could extract from them, perhaps 

to add to our own conventional design toolset? 

5. Does this method scale up for larger problems? 

Section 2 will describe the GA, the simulator, the fault model, the tasks to be 
evolved and the fitness evaluation mechanism for BIST. Section 3 presents the 
results achieved while section 4 discusses what was learned and future avenues. 

2 Method 

2.1 The Genetic Algorithm 

A generational Genetic Algorithm (GA) is used with a population size of 32 
with 2 elites where 60% of the next generation is created through mutation and 
the rest by single-point crossover. Following from earlier work [26] we adopted 
the model of a small genetically semi-converged population evolving for many 
generations. Fitness ranges from 0 (worst) to 1 (best). An adaptive mutation rate 
is used, analogous to a Simulated Annealing strategy moving from “exploring” 
to “exploiting” . For each individual exactly m mutations are made, where m = 
\_k^ X In (l//)J + Wfloor, kr = in™//°^- ) ’ Current average fitness, /min is 

minimum possible fitness above 0, msoor = 1 is the number of mutations applied 
as / reaches 1, and mroof = 10 is the number of mutations that would be applied 
if / = /min- This assumes that / > 0 which is safe under the settings used in 
this paper. Informal preliminary experiments indicated solutions were found in 
less generations than with a constant mutation rate. Linear rank selection is 
used, such that the elite has twice the selective advantage of the median of the 
population. 

The genotype-phenotype mapping used is similar to [18] excepting that: the 
locations of circuit outputs are fixed, there are no limitations on connectivity 
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allowing sequential circuits, and the genotype is encoded in binary. The genotype 
length depends on the task being evolved. To allow implementation across a 
network of processors, an island based model was used [25] with a low migration 
rate. This population structure may have aided the search, but the details are 
not thought to be crucial to the results. 



2.2 The Simulator 

The simulator used is a simple version of an event driven digital logic simulator 
in which each logic unit is in charge of its own behaviour when given discrete 
time-slices and the state of its inputs. Logic units are Look-Up Tables (LUT) 
of two inputs capable of representing any two input logic gate. Any unit can 
be connected to any other allowing sequential circuits, so care must be taken to 
update all units “simultaneously”. This is achieved by sending the time-slices 
to the logic units in two waves: the first to read their inputs and the second to 
update their outputs. During each evaluation, circuit inputs were kept stable for 
20 time-slices and the outputs were read in the second half of this cycle allowing 
them time to settle. 

Gate delays are simulated in time-slice units and are randomized with a 
Gaussian distribution {[fi = 1.5, cr^ = 0.5J). This amounts to a noisy fitness 
evaluation with the intention to facilitate transfer of sequential circuits to real 
hardware as in a “Minimal Simulation” [10]. At the start of each generation, e 
different sets of logic delays are generated, simulating e different variations to the 
circuit delays from manufacturing or environmental variation. Each individual’s 
performance is then evaluated in each of the e conditions, and its fitness is the 
mean value. The number e is occasionally adjusted by hand during the run, such 
that more evaluations are used as the population leaves the “exploring” stage 
and enters the “exploiting” stage. It was set such that twice the standard error 
of the series of fitness trials is significantly smaller than the difference in fitness 
between adjacent individuals in the rank table with non-equal fitnesses. 

The Single-Stuck-At (SSA) Fault model was chosen because it simulates the 
most common type of on-line failure - that produced by radiation hazards in the 
absence of mishandling or manufacturing defects [23, 33] . It is also an industry 
standard and it has been shown that tests generated for SSA faults are also good 
at detecting other types of faults. SSA faults can be introduced at any of the 
logic units of the simulator simply by setting its output always to 0 or 1. 



2.3 The Tasks 

Two simple problems were chosen: a one bit adder with carry and a two bit 
multiplier. The adder was chosen as a small combinational circuit complex 
enough to be a good starting point for attempting to evolve BIST. It has 
three inputs A,B,Cm and two outputs S,Cout such that S = A (B B (B Cm 
and Cout = A- B + A- Cm + B ■ Cm- The multiplier, chosen as a step up from the 
adder, has four inputs AiAqBiBo and four outputs P3P2P1P0 where P = Ax B. 
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The evolving networks may be recurrent and could show an unwanted depen- 
dence on the order in which inputs are presented, and on the networks’ internal 
state. To demand insensitivity to input ordering, the same approach was taken 
as for the randomization of logic delays (above): at the start of each generation, 
e (the same number e defining the number of evaluations with random gate de- 
lays above) different orderings of the full set of possible inputs for that task were 
generated, and the individuals of that generation evaluated on all of them. On 
each of the e evaluations the circuit state was reset, then the ordering of the full 
set of inputs was presented twice in sequence, to prevent dependence on initial 
conditions. 

The task evaluation score was measured as follows. Let Qr be the concate- 
nation of the series of values at the output bit for the final 10 time-slices of 
the presentation of each input vector during an evaluation, and QJ, the desired 
response. We take the modulus of the correlation of Qr and Q'^., averaged over 
all N outputs: 



ft 



T,r=o^ \corr{Qr,Q'r)\ 
N 



( 1 ) 



2.4 Evolving BIST 

An extra output E was recorded from circuits with the aim that it would go 
high whenever a fault affected any other output. The performance of a circuit 
at its main task ft and at BIST behaviour ft, were evaluated separately. BIST 
behaviour itself was evaluated with two fitness measures: 

1. BIST per fault /b^: Let Uf be the number of faults affecting task performance 
for which none of the possible input vectors raises E. Then /bp encourages 
faults to be “detectable”: /bp = 1/ (1 -I- rtf x k{) where kf was chosen to be 
25, to give /bp good sensitivity when U{ is small. 

2. BIST per instance /b,: Let u, be the number of instances out of all possible 
combinations of SSA faults and input vectors, for which the task output is 
incorrect but E is low. Then /bj encourages immediate detection of faults: 
/bi = 1/ (1 -I- Mi X /ci) where k, was chosen to be 200. 

Notice that having a high /bp but low /bj is similar to off-line BIST solutions 
while having a high /b, is like an on-line BIST detecting faults at the first 
instance they affect circuit behaviour. 

Mf is measured by evaluating task fitness ft separately under all SSA faults to 
every unit able to affect the task outputs. The same set of e evaluation conditions 
chosen for the current generation is used. If ft falls by at least 0.001 due to a fault, 
then it is considered to affect task performance, u, is measured by comparing the 
output of the circuit under each fault and input vector to its output for the same 
input vector under no faults. The output is deemed unaffected if it is the same at 
steps 10, 15 and 20 of the 20 time steps for which inputs are stable. Hence most 
faults that induce oscillations will be detected. If during the simulated time E 
goes high for more than eSize consecutive time slices then it is considered to 
have gone high. eSize is set at 7: greater than average race conditions yet not 
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excluding a wide range of behaviours. A circuit exhibiting a high E when no 
faults were in place was deemed to have /bp = f hi = 0. 

It seems foolish to try to detect faults at every instance when some faults are 
not detected at any instance. Early runs (§3.1) incorporated the objectives into 
one overall fitness value, but with more emphasis on /bp than on /bj and more on 
ft than on /b: / = wt X ff + Wh x /b where /b = Wbp x /^^ +Wbi x fhi and weights 
Wt « Wbp ~ 0.85, Wh = I — Wt and Whi = 1 — iCbp. Later runs (§3.2) exploited 
our use of rank selection. The objectives were given the priority ft > /bp > fbi, 
and when sorting the individuals the comparison operator only considered an 
objective if higher priority objectives were equal. An extra objective encouraging 
parsimony was also used, having the lowest priority of all. 

3 Results 

3.1 One Bit Adder with Carry 

The maximum number of logic units was constrained to 13 and the genotype 
length was 156 bits. This circuit can be implemented with a minimum of five 
logic units. It is worth mentioning that the straightforward TPG-DUT-TRE 
BIST architecture (§1) would require 3 flip-flops and 2 Xor gates for the TPG 
LFSR and 2 flip-flops and 1 Xor for the TRE LFSR totalling 13 extra gates. Even 
if unusual to apply this method to circuits of this size, it is still valuable as a 
comparison of BIST quality and underlying operation principles because it is an 
industry standard. The minimal on-line BIST solution - a voter with two copies 
~ would require 8 extra gates, 5 for the extra copy and 3 for the comparison 
logic. 

A) Hybrid Minimalist. This run was seeded with the elite of generation 10100 
from a previous run which was started from a population of random individuals 
and did not take fhi into account. This elite was bred because it provided 82% 
off-line fault coverage being only 3 gates larger than the minimum adder. 

After 4000 generations the elite was the circuit shown in Fig. la that can be 
trivially pruned to 7 gates by skipping unit 5 as shown. This circuit detects all 
SSA faults affecting task performance save unit 1 SAO (clearly since E = uiAus) 
with either input test vector {A, B, Cm) = (0, 0, 0) detecting all SSA-1 faults or 
(1,0,0) detecting all SSA-0. Note these are the only input vectors detecting 
faults since E is low whenever B or Cm are high. Due to its short test pattern 
this could be considered as a hybrid of the on-line and off-line approaches using 
the best of both worlds. 

The formula for E is B V Cm A Cout and acts as a consistency check: “Cout 
cannot be high if both B and Cm are low” . The use of Xor gates - which propa- 
gate any bit flip ~ in the calculation of S, which is used for Coutj ensures all faults 
raise Cout when B and Cm are low and A is high or low, thus checking S im- 
plicitly. This method seems like a cross between arithmetic coding “checksums” 
and state monitoring. 

Table 1 compares this to conventional solutions. 
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Fig. 1. Evolved BIST solutions for the adder: (a) nearly full off-line fault coverage with 
2 extra gates and (b) full on-line solution with 3 extra gates. Unit 5 can be trivially 
removed in both - presumably the two necessary mutations would have eventually 
occurred. LUTs are displayed as their equivalent two input logic gate. 



B) On-Line full. About 15000 generations later in the same run the full on-line 
BIST solution shown in Fig. lb was the elite and can again be trivially pruned 
to 9 gates. The design can be decomposed into modules a (adder) and /3 (BIST). 
Sub-modules and /3c calculate Gout and Gout respectively while unit 2 (whose 
output is E) acts as a voter. Errors in S are detected because they affect and 
/?c differently. Here evolution has arrived at the mission critical system design 
concept of design diversity [1] and gone beyond by aiming for circuits having 
minimal re-design necessary for maximal fault-coverage. Even though it is not 
surprising that Oc and /3c are different, because there was no gene copy operator, 
the fact that their differences are exploited to evaluate the correctness of S and 
Gout simultaneously and under any fault and input vector is quite remarkable. 

E also went high whenever the circuit failed due to pathological gate delays 
{delay faults). Hence this method could also be useful to catch out deviations 
of standard behaviour due to abnormal operating conditions {parametric faults) 
which could be useful in unconventional evolved designs such as [26] . Sequential 
faults are caught since the circuits are tested under more than ten different 
random orderings of the full input test sequence at each evaluation. Being an 
on-line solution transient faults will also be detected as they look like a temporary 
SSA. 

Table 1 compares this to conventional solutions. 



Adders Discussed. Evolutionary methods have found better solutions in terms 
of standard BIST design criteria than those of conventional methods. It is up to 
hardware designers to consider it worthwhile to have BIST on circuits of this size. 
Other runs minimized a hand-designed voting BIST solution from 8 overhead 
gates disappointingly just down to 7, arrived at a nearly full on-line solution 
with 8 overhead gates which included a low-pass filter ironing out race conditions 
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Table 1. Comparison of test quality attributes between conventional and evolved 
BIST solutions to the adder problem. Fault types; C=combinatorial, S=sequential, 
T=transient, D=delay. Test controller logic is not taken into account for off-line solu- 
tions. 



Quality attribute 


TPG-DUT-TRE 


Evolved A 


Voting 


Evolved B 


Fault Coverage 


100% 


90% 


100% 


100% 


Test Scope 


module 


module 


module 


module 


Fault types 


C 


CS 


GST 


CSTD 


Area overhead 


13 units 


3 units 


8 units 


4 units 


Pin overhead 


2 (start/error) 


2 


1 


1 


Perl, penalty 


0 


1 gate delay 


0 


1 gate delay 


Test time 


8 test vectors 


2 test vectors 


0 (on-line) 


0 (on-line) 



because it was evolved with a small eSize (meaning it could be clocked faster 
when deployed), evolved a full on-line solution with 6 overhead gates exhibiting a 
linear relation between the minimum eSize it would work with and the standard 
deviation <t^ of its delays, and evolved a circuit very similar to the one in Fig. 
lb from a random initial population. 

3.2 Two Bit Multiplier 

The maximum size allowed for these circuits was 28 LUTs and the genotype 
length was 392 bits. The smallest implementations of this circuit use 7 two 
input logic gates [18], such as shown in Fig. 2a. Even though the difference in 
size between the adder and multiplier is not great, there is twice the number of 
outputs to check: a TPG-DUT-TRE BIST approach would require 11 gates for 
each of the LFSRs totalling 22 extra gates while a voter would require 7 gates 
for the comparators again totalling 14 extra gates. 

Note that evolving this kind of circuit, even without any BIST functionality 
uses a considerable amount of CPU time as this seems to grow exponentially 
with the complexity of the specification [30] . 



Off-Line Full. This run was seeded with the conventionally designed mul- 
tiplier shown in Fig. 2a. After around 150000 generations the circuit of Fig. 
2b was the elite (after pruning as in Fig. 1). This is nearly equal to the seed 
except that one input of unit 0 comes from unit 2 instead of unit 5, which 
implements a similar strategy to that used in §3.1: one output is influenced 
by others through their reuse in a cascading fashion - here output P 2 is cal- 
culated using P3 which is calculated using Pi - so only this output need be 
checked. This is achieved by unit 14, which compares P 2 to a recalculation of 
its inverse by units 18 and 27. Only these gates (14, 18, 27) are used for BIST 
behaviour achieving full off-line fault coverage with the input vector pattern 
set (Ai, Aq, Bi,Bq) = {(0, 1, 1, 1), (1, 0, 1, 1), (1, 1, 1, 1)}, reducing the size of the 
LFSR to 5 gates while the signature analyzer LFSR is replaced by the error line. 
This run has shown that this method can also be used to add BIST behaviour 
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Fig. 2. ( a) Hand designed two bit multiplier extracted from Fig. 15a of [18] is (b) post- 
evolved to incorporate BIST with 100% off-line coverage using only 3 gates overhead. 



to existing hand designed circuits without substantial modifications to circuit 
structure thus leaving its properties unaltered. 



4 Conclusion 

A proof of principle has been established and the questions set out in §1 can 
be answered: circuits with BIST behaviour are within the reach of evolutionary 
search, can reuse components for the main task and BIST functionality, and they 
are competitive in overhead both to off-line and on-line solutions. Low overhead 
on-line and hybrid solutions have been found. These solutions would function 
correctly in real hardware because the slightly unconventional simulator does 
capture the processes and variations influencing combinatorial circuits. Evolu- 
tion has explored design space containing established methods for BIST design 
and beyond. Some solutions use a “checksum” formula on the current circuit 
state to evaluate correctness (comparable to the “Immunotronics” [3] approach 
for sequential systems which evaluates correct circuit state transitions), others 
increase testability by cascading outputs so that errors are propagated to a single 
output and others exploit design diversity to minimize redundancy in a voting 
system. These methods could prove useful to be adopted by designers. For cir- 
cuits larger than those considered here, it is still unclear whether the enhanced 
BIST strategies produced by evolution would be worth the computational ef- 
fort needed to produce them since the evolution of large BIST circuits faces the 
same problems - and perhaps, solutions - as with other circuits [11,28,29,31]. 
However, Vassilev et al. [30] have suggested that when large circuits can be 
evolved (perhaps from a hand-designed seed), their size leads to greater scope 
for evolutionary optimization. 

Most cases arrived at modular designs (as in [18, 27]). Further understanding 
of this effect may aid future experiments. The modularity of the circuits can be 
useful in identifying design patterns or principles. 

Any single circuit will have its own characteristics for which a particular 
BIST strategy is most suited. Evolution through blind variation and selection 
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is capable of searching for this strategy without constraints. The evolved BIST 
circuits seem well suited to uncovering delay faults and may be useful for para- 
metric faults such as those occurring in [26]. The evolution of self-diagnosing 
analog hardware is an interesting possibility where the E line could give an esti- 
mate of how wrong the outputs are. Future work could also include the adoption 
of a more comprehensive fault model where the memory of the LUTs is affected 
- as in FPGAs - and other common faults are simulated, and tackling more com- 
plex tasks such as sequential circuits - synchronous and asynchronous - perhaps 
using techniques set out in [11, 28, 29, 31]. 
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Abstract. This paper presents a genetic algorithm (GA) that solves the problem 
of routing a multiplexer network into a MUXTREE embryonic array. The pro- 
cedure to translate the multiplexer network into a form suitable for the GA- 
based router is explained. The genetic algorithm works on a population of con- 
figuration registers (genome) that define the functionality and connectivity of 
the array. Fitness of each individual is evaluated and those closer to solving the 
required routing are selected for the next generation. A matrix-based method to 
evaluate the routing defined by each individual is also explained. The output of 
the genetic router is a VHDL program describing a look-up table that receives 
the cell co-ordinates as inputs and returns the value of the corresponding con- 
figuration register. The routing of a module- 10 counter is presented as an ex- 
ample of the capabilities of the genetic router. The genetic algorithm approach 
provides not one, but multiple solutions to the routing problem, opening the 
road to a new level of redundancy where a new “genome” can be downloaded 
to the array when the conventional reconfiguration strategy runs out of spare 
cells. 



1 Introduction 

The Embryonics project was originally proposed by Mange et al [1] from the Ecole 
Polytechnique Federate de Laussane in Switzerland, and soon after the University of 
York, UK, joined the efforts under the lead of Tyrrell [2]. During the past few years 
research on Embryonics has gained momentum and the first practical demonstrations 
of the technology have already been shown [3, 4]. Embryonics proposes the incorpo- 
ration of biological concepts like growth, reproduction and healing into the realm of 
silicon processor-arrays [5]. One of the possible Embryonics implementations is the 
MUXTREE architecture [6, 7]. In MUXTREE implementations, a logic function is 
translated from truth tables into multiplexer networks via ordered binary decision 
diagrams (OBDDs). The multiplexer network has to be mapped into the structure of 
an embryonic array, which is a 2-D array of processing cells. Every cell performs the 



A.M. Tyrrell, P.C. Haddow, and J. Torresen (Eds.): ICES 2003, LNCS 2606, pp. 249-261, 2003. 
© Springer-Verlag Berlin Heidelberg 2003 




250 Cesar Ortega-Sanchez, Jose Torres-Jimenez, and Jorge Morales-Cruz 



function of a two-input, one-output multiplexer. Connectivity between cells is limited 
to the nearest neighbours, and the number of data lines connecting cells to each other 
are scarce. Inputs to the multiplexer and routing of signals within each cell are set by 
a configuration register. In the MUXTREE architecture, each cell of the array con- 
tains a copy of the configuration registers of all the cells in the array [8]. 

The set of all configuration registers in an array is called “the genome” of the ap- 
plication. In each cell, a configuration register is selected by a unique pair of co- 
ordinates. When a cell fails, it is possible to reconfigure the array by changing the co- 
ordinates of cells so that the failing cell is logically eliminated and substituted by a 
healthy one. By these means embryonic arrays achieve fault-tolerance [9]. 

Connectivity limitations make mapping the multiplexer network into the embry- 
onic array a task that grows in difficulty as the complexity (size) of the application 
grows. For small functions, it is possible, although time-consuming, to manually route 
the multiplexer network into the embryonic array. However, the time (and patience!) 
required to route medium-size applications is unacceptable for all practical purposes. 

Given that Embryonics is a relatively new field, it lacks the CAD tools available 
for other, more mature bio-inspired technologies (artificial neural nets and genetic 
algorithms, for example). In its present state, MUXTREE applications can be synthe- 
sised and simulated using FPGA design software, like Xilinx’s Foundation. However, 
the early stages of a design have to be developed by hand. 

This paper presents a recently developed design tool that helps the designer in what 
is considered the most challenging and time-consuming task during the development 
of a MUXTREE application: the mapping of a multiplexer network into the fixed 
structure of an embryonic array. At the core of the router is a genetic algorithm (GA) 
that selects from a multitude of possible routings those that satisfy the connectivity of 
the multiplexer network given as input. 

The paper is organised as follows: Section 2 introduces the MUXTREE architec- 
ture and the typical design flow of an application. Section 3 describes the genetic 
algorithm that solves the routing. In Section 4, a module- 10 counter is presented as an 
example of the capabilities of the genetic algorithm employed as router. Results ob- 
tained with other examples are also presented. Section 5 resumes the conclusions and 
future work of this research. 



2 Design Flow of a MUXTREE Application 

Although the processing power of an individual MUXTREE cell is that of a multi- 
plexer, an array of such cells can contain a multiplexer network representing any 
combinational or sequential logic function. The size of the network would be limited 
only by the physical dimensions of the array (allowing, of course, spare elements 
necessary for the embryonic reconfiguration). 

Figure 1 shows a simplified diagram of the MUXTREE embryonic cell. A detailed 
description of the architecture can be found elsewhere [8]. 
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Fig. 1. Simplified diagram of a MUXTREE embryonic cell 



In a typical application, the following steps are necessary to implement a combina- 
tional or sequential logic function using a MUXTREE embryonic array: 

1 . Describe the application as a set of logic functions or as a truth table. 

2. Erom the description, construct an Ordered Binary Decision Diagram (OBDD). 

3. Represent the OBDD as a multiplexer network. 

4. Route the multiplexer network in an array of MUXTREE cells. 

5. Obtain the configuration register of all the cells in the array. 

6. Generate the “genome” of the application by grouping the configuration registers 
of all the cells in the array. 

7. Use the genome as input for the software that is used to synthesise and simulate the 
design. 

In a typical implementation, steps 1 to 6 are carried out by hand. Step 4 is the most 
difficult and time-consuming due to the limited interconnection between cells. A 
genetic algorithm-based, automatic router that implements steps 4 to 6 has been de- 
veloped. The following section describes this tool in detail. 



3 Routing Multiplexer Networks Using a Genetic Algorithm 

During the past few years, genetic algorithms have been intensively used to solve 
problems whose solution either cannot be found analytically, or numeric methods take 
too long to find [10]. GAs are very good at finding solutions in vast search-spaces 
because they are inspired in the process of natural selection, where individuals who 
are fitter than others have more chances of passing their genes to the next genera- 
tion [11]. 
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The following elements are needed to solve a problem using GAs: 

• A suitable representation of the problem so that an individual from a population 
can represent a possible solution. 

• A randomly generated population of possible solutions. 

• A fitness function that evaluates how close to solving the problem is every individ- 
ual in the population. 

• A selection criterion that discriminates individuals according to their fitness. 

• Some genetic operators (typically mutation and crossover) to generate diversity 
within the population and to prevent the solutions from being stuck around local 
minima. 

• A stop criterion that allows the system to decide when to finish the search. It can 
be either because a solution has been found, or because a specified number of gen- 
erations have been tested. 

Figure 2 shows a flow diagram representing the logical flow of a GA. 




Fig. 2. Flow diagram of a typical genetic algorithm. 

To solve the routing of multiplexer networks into MUXTREE arrays, the diagram 
in figure 2 was implemented in a C program. Three text input files indicate the topol- 
ogy of the target multiplexer network. Their content is presented next. 



3.1 Input Files 

Eor all practical purposes, a multiplexer network can be completely described by 
stating: the number of levels it contains, the number of multiplexers in each level and 
the inputs that arrive to each multiplexer (data and control). Additionally, in the 
MUXTREE architecture, it is necessary to distinguish the multiplexers that work in 
synchronous mode, i.e. with their outputs latched by a clock; from those that work 
asynchronously. The following example shows a simple multiplexer network and the 
input files associated to it. 

In file number one, parameters pertaining to the GA are provided along with the 
size of the MUXTREE array that should contain the multiplexer network. Also the 
features that characterise the target network are given. Figure 3 shows the multiplexer 
network of a 3 -input voter, along with the text of input file 1. 
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0.8 


/Crossover probability 


0.15 


/Mutation probability 


2 3 3 


/Size of destination array 2x3. 
3 multiplexers in the network. 


2 2 1 


/Two levels with two multiplexers 
in the first level and one in the 




second level 


0 


/No synchronous multiplexers 


1 2 


/Number of outputs and tags of the 
multiplexers that carry the final 
output lines. 



Fig. 3. 3-input voter multiplexer network and its input file 1 for the genetic router 



The inputs for each multiplexer are indicated in a second input file. The file corre- 
sponding to the example in figure 3 is presented next. The order of inputs for each 
multiplexer is: Selection, Right input (1), Left input (0). In a MUXTREE array, mul- 
tiplexers are numbered from left to right and from bottom to top. 



'BAOBlACba 




Inputs for multiplexer 2 
Inputs for multiplexer 1 
Inputs for multiplexer 0 



A third input file indicates the multiplexers that receive input variables, if any. In 
the MUXTREE architecture, variables can be fed to the array only through row 0, i.e. 
the bottom row of the array. Each multiplexer can receive up to two variables. In 
some difficult designs, the same variable can be fed in more than one input in order to 
simplify the routing. Following is the text of the third input file for the example in 
figure 3. For demonstration purposes, input variable A is fed through the two cells in 
the bottom row (the array size is 2x3). 



' BACA ' 




Input variables in multiplexer 1 
Input variables in multiplexer 0 



Generation of the three input files can be easily automated, but at the present stage 
they are still hand-generated. 



3.2 Initial Population 

The function of a MUXTREE cell is defined by a 17-bit configuration register [5]. 
Eight of those bits define the routing of signals within the cell. This byte is called the 
routing byte. Therefore, the flow of signals within a particular array will be defined 
by the routing bytes of all cells in the array (the routing genome). The size of this 
genome in bits is 8 times the number of cells in the array. For a 2x3 array, its routing 
would be completely defined by 2x3x8= 48 bits; therefore, any randomly generated 
48-bit pattern will encode an arbitrary routing. In the genetic router proposed, every 
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individual of the population is a routing genome. In the examples presented next, the 
initial population consists of 100 randomly generated individuals. 



3.3 Fitness Function and Evaluation 

The problem of evaluation is to find out how close an arbitrary routing is from the 
routing needed to implement the target multiplexer network. To achieve this, a care- 
fully chosen fitness function must be used. 

Definition 1: Two multiplexers are equivalent when they have the same inputs in the 
same order. Such inputs are defined as correct inputs. 

Definition 2: Two multiplexers are partially equivalent when they have at least one 
input in the same position, i.e. at least one correct input. 

By these definitions, a given routed MUXTREE array would be able to implement 
the same function as the target multiplexer network when it contains a set of multi- 
plexers equivalent to those in the target network. In other words, the routing will be 
complete when it contains equivalent multiplexers for all multiplexers in the target 
network. Consequently, the fitness of a particular routing should be related to the 
number of equivalent and partially equivalent multiplexers. Hence, the fitness of a 
routing is defined as the number of correct inputs it contains. Since every multiplexer 
has three ordered inputs, the maximum fitness a routing can achieve is the number of 
multiplexers in the target network (n) multiplied by 3. 



To determine the fitness of a particular routing, it is necessary to locate all the cor- 
rect inputs it contains. Eor this purpose, a connectivity matrix is used to “simulate” 
the propagation of signals within the network. Before propagation, the multiplexers in 
the embryonic array have no particular correspondent in the target network. After 
propagation, corresponding multiplexers are determined according to the number 
correct inputs every embryonic multiplexer has. 

At the end of the evaluation process, every individual of a population will have a 
number associated to it. The bigger the number, the closer that routing is from solving 
the target network. For space reasons, a more detailed description of the connectivity 
matrix and the propagation process will be left for a future paper. 



3.4 Selection 

Selection is the process of choosing the individuals that will be parents of a new 
population. The genetic router presented in this paper implements selection by tour- 
nament, which consists in randomly selecting two genomes from the population and 
choosing the one with the highest fitness. For example. 



If 


f* > 


then: 


C. = G 


If 




then: 


C. = G 


If 


II 


then: 


C. = G 
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Where: 

= Fitness of genome A; = Fitness of genome B 

= Genome A; Gg = Genome B; = Chosen genome 



3.5 Crossover and Mutation 

Crossover and mutation are used in the genetic router to prevent solutions from clus- 
tering around local minima. 

Crossover consists of taking two ‘parents’ that have survived the selection process 
and mixing them to create two new individuals. Figure 4 shows two six-byte indi- 
viduals being crossed over in two randomly selected points. 





Breaking points 


Parent 1 


B1 


B2 


B3 


B4 


B5 


B6 








Parent 2 


BA 


BB 


BC 


BD 


BE 


BE 






JJ 




Child 1 


B1 


B2 


BC 


BD 


B5 


B6 








Child 2 


BA 


BB 


B3 


B4 


BE 


BE 





Fig. 4. Crossover between two individuals of a population 

The crossover points are selected in such a way that bytes remain unaltered so that 
well-routed cells are preserved from one generation to the next. 

Crossover is performed according to a crossover probability. Every time a cross- 
over is going to take place, a random number is generated; if the number is smaller 
than crossover probability, then the crossover takes place, otherwise both parents pass 
to the next generation unchanged. In the examples presented in this paper, crossover 
probability is 0.8. 

It is possible to apply mutation to children generated by crossover. By means of a 
mechanism similar to the one used in crossover, a mutation probability will define the 
possibility of mutation. Mutation probability in the examples of this paper is 0.15. 
When mutation is applied, the byte and bit to be mutated are selected randomly. Mu- 
tation inverts the logic value of the selected bit. 



3.6 New Population 

A new population is generated with the individuals selected for their fitness. It is 
possible that two new individuals end up being identical after they have passed 
through crossover and mutation. If that is the case, one of the repeat individuals is 
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eliminated from the population and a brand new individual is randomly generated. All 
individuals in the new population are different to one another. 



3.7 Output File 

The genetic algorithm that solves the routing of multiplexer networks into 
MUXTREE arrays stops searching when one of the following conditions is met: A 
solution has been found, or the search has run a predetermined number of generations 
whether or not a solution has been found. The latter case can end up with none, one or 
multiple solutions to the routing. 

If at least one solution is found, the genetic router generates a VHDL file contain- 
ing the description of a look-up table that receives at its inputs the co-ordinates of a 
cell and returns the configuration register associated to that cell. This file has to be 
integrated to the design of the MUXTREE array that will implement the desired func- 
tion. In section 4 there is a VHDL file generated by the genetic router. 

If multiple solutions are found, a new level of fault-tolerance could be introduced 
to MUXTREE arrays. In its present implementation, MUXTREE arrays are disabled 
when spare cells have ran out and a new fault arises. However, with multiple routings 
capable of implementing the logic functions represented by the multiplexer network, 
it would be possible to download a new genome to the array every time spare cells 
run out. It is possible that one of the “spare genomes” can still implement the desired 
function. 



4 Example: Routing of a Module-10 Counter 



To demonstrate the functionality of the genetic router, the implementation of a mod- 
ule- 10 counter is presented next. Table 1 shows the truth table of the counter and 
figure 5 the multiplexer network that implements it. 



Table 1. Truth table of a module- 10 counter 
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0 


0 


1 
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0 


0 


0 


other combinations 


0 


0 


0 


0 



Table 2 presents the text of the three input files according to figure 5. The content 
of the third file is only a “0” because in a counter there are no external inputs, i.e. the 
application is purely sequential. This is indicated by the flip-flops at the outputs of the 
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multiplexers that deliver the counting function in figure 5. These outputs are latched 
by a clock common to all cells. 




Fig. 5. Multiplexer network that implements a module-10 counter 



Table 2. Input files for the genetic router 



First input file 


Second input file 


Third input file 


0 . 8 
0 . 15 
4 5 15 

4 4 5 4 2 

4 2 10 13 14 

4 2 10 13 14 


1 AO 1 AO 1 AO 1 AOOB A 1 BaABbOB 
AcBOdCefDOOCghCOiDOjDk 


0 



The following code is the VHDL file that the genetic router automatically gener- 
ated when found the routing for the module- 10 counter. The value of the configura- 
tion register for spare cells is specified by the case “when others”. 

library IEEE; 
entity Mem_counter9 is 
port ( 

okaux: in STD_LOGIC; 

xy: in STD_LOGIC_VECTOR (7 downto 0) ; 
conf: out STD_LOGIC_VECTOR (16 downto 0)); 
end Mem_counter9 ; 

architecture Mem_counter9_arch of Mem_counter9 is 

type REG is array (16 downto 0) of bit; 

begin 

process (okaux, xy) 
begin 

if ( okaux = 'O' ) then 

conf <= "01001000100000000"; 
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else 

case xy is 



when 


"00000 


0 


0 


0" 


= > 


conf 


<= 0 
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0 


1 


0 


1 


0 
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0 


1 
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1" ; 


when 
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end case; 
end i f ; 
end process; 
end Mem_counter9_arch; 



Figure 6 shows part of the simulation of the module- 10 counter implemented in a 
MUXTREE array. Synthesis and simulation were carried out in Xilinx’s Foundation. 
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Fig. 6. Simulation of the module- 10 counter implemented in a MUXTREE array 



Figure 7 shows a graphical representation of one of the solutions found by the ge- 
netic router. The clock input is common to all cells. Solid lines represent fixed con- 
nections and broken lines are the connections programmed by the genome of the 
module-10 counter. Numbers in the multiplexers correspond to those in figure 5. A 
careful visual inspection of figure 7 will demonstrate that in fact, the multiplexer 
network in figure 5 is contained in the embryonic array presented. 
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4.1 Results with Other Examples 

Table 3 resumes the results obtained when routing other applications. The column 
entitled Type indicates whether the application is combinational (C) or sequential (S). 
The Array size column indicates the size of the MUXTREE array that contained the 
application. The following column contains the size of the multiplexer network, fol- 
lowed by the maximum number of generations that the genetic algorithm searched for 
the solution. The following column indicates in which generation the first solution 
was found, followed by the time taken to find it. Next column presents the number of 
valid solutions delivered by the genetic router when ran the maximum number of 
generations. The last column presents the probability of successfully routing the cor- 
responding application, e.g. the genetic router will find a solution to the 3-bit up/down 
counter, 7 out of 10 times that the program runs 10,000 generations. 



5 Conclusions and Future Work 

The genetic router presented in this paper is capable of mapping a given multiplexer 
network into a MUXTREE array. The router could also he applied to solve routing 
problems in networks presenting a topology similar to that of MUXTREE arrays. 
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Table 3. Results obtained when routing other applications. 



Application 


Type 


Array 

size 


Mux-net 

size 


Num. of 
gen. 


Gen. of 
1“ sol. 


Time to 
1“ sol. 


Num. of 
solutions 


Routing 

prob. 


3 -hit voter 


C 


2x2 


3 


100 


1 


0.5 seg 


82 


1 


3-hit U/D counter 


S 


3x3 


5 


10000 


250 


1.2 min 


85 


0.7 


4-hit CRC generator 


C 


2x6 


10 


10000 


80 


45 seg 


78 


0.5 


2-hit complete adder 


c 


4x8 


15 


30000 


8500 


60 min 


75 


0.4 


4-hit parity generator 


c 


4x8 


15 


30000 


2500 


25 min 


80 


0.9 


5-hit parity generator 


c 


5x6 


11 


20000 


450 


8 min 


76 


0.2 


Module- 10 counter 


s 


4x5 


15 


30000 


6500 


60 min 


80 


0.5 



The multiple solutions delivered by the genetic router open the possibility of a new 
level of redundancy in applications requiring high levels of availability. Different 
routings could be tried in the same hardware until functionality of the array is re- 
stored. 

There still are questions regarding the capabilities of the genetic router that require 
further research. How efficient in terms of resource-usage are the solutions found by 
the router? What is the maximum number of multiplexers that the router can solve? 
How sensitive the router’s performance is to changes in the GA parameters? How 
well the genetic router compares against other routing techniques? [12-15] Embryon- 
ics is a very vast field where much research remains to be done. 
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Abstract*. Integrated-circuit producers are shoved by competitive pressure; 
new devices require increasingly complex verifications to be performed at in- 
creasing pace. This paper presents a methodology to automatically induce a test 
program for a microprocessor that maximizes a given verification metric. The 
methodology is based on an auto-adaptive evolutionary algorithm and exploits a 
syntactical description of microprocessor assembly language and an RT-level 
functional model. Experimental results clearly show the effectiveness of the ap- 
proach. Comparisons reveal how auto-adaptive mechanisms dramatically en- 
hance both performances and quality of the results. 



1 Introduction 

Today competitive pressure compels all integrated-circuit producers cutting time to 
market. Engineers are required to design more and more complex devices, and all 
devices must be carefully checked before starting production. Thus, the current trend 
requires increasingly complex verifications to be performed at increasing pace. De- 
sign verification is a process similar to software debugging: device functionalities are 
tested, and characteristics examined for determining possible inconsistencies. 

Concerning microprocessors, it has been maintained that about a third of the cost 
of developing a new one is devoted to this hardware debugging and testing [1]. Nev- 
ertheless, the inadequacy of existing verification methods is clearly illustrated by the 
Pentium’s FDIV error, which cost its manufacturer an estimated $500 million. 

The verification process of a microprocessor requires a deep knowledge of the in- 
struction set and instruction format since only correct programs can internally perform 
meaningful operations. To verify microprocessor design, manufacturers usually rely 
on simulating the execution of test programs meticulously written by hand. Flowever, 
hand-written test cases can be only exploited as a first line of defense against bugs, 
since they focus on basic functionalities and important but rarely-occurring corner 
cases. During the whole design process, exhaustive or nearly-exhaustive tests are 
often necessary, but the effort required to generate them manually may be practically 
unworkable. 
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Today, advances in simulation and emulation technology enabled the use of other 
sources of test stimuli such as existing application and system software [1]. Addition- 
ally, to increase test productivity sophisticated test-generation systems have been 
proposed [2] [3]. Nevertheless, although all these approaches can significantly in- 
crease design productivity, they are still biased towards corner cases, far from being 
fully automated and not broadly exploitable. 

This paper presents an innovative methodology for generating a test case for vali- 
dating a microprocessor core. An auto-adaptive evolutionary algorithm is exploited 
for generating an assembly program able to maximize a predefined verification met- 
ric. The algorithm relies on a description of the assembly language implemented by 
the processor and requires limited human intervention. At the end of the process, the 
test program can be simulated and designers are required to manually analyze only 
those parts of the description that the tool possibly failed to validate. 

The proposed approach was tested on the i8051 microcontroller using the state- 
ment coverage as verification metric. The i8051, despite its relatively old age, is still 
one of the most popular microprocessors and can be considered a good example of a 
small microcontroller. The methodology is easily applicable to more complex de- 
signs, like pipelined microprocessors, and can easily exploit different design verifica- 
tion metrics, such as branch, conditional or path coverage. 

Next Section introduces simulation-based design verification. Section 3 details the 
auto-adaptive evolutionary algorithm. Section 4 illustrates the case study and Section 
5 reports experimental evaluation. Section 6 concludes the paper. 



2 Simulation-Based Design Verification 

Given an RT-level description of a microprocessor, simulation-based verification 
requires a test program and a tool able to simulate its execution. The goal is to un- 
cover design errors, and the effectiveness of the test program is measured exploiting a 
given metric. The verification metric adopted in this paper is the statement coverage. 

To avoid confusion, in the following the term “instruction” denotes an instruction 
in an assembly program, and the term “statement” refers to a statement in an RT-level 
description. The term “execute” is commonly used in both domains: instructions in a 
program are executed when the processor fetches them and operates accordingly; 
statements in a VHDL description are executed when the simulator evaluates them to 
infer design behavior. The verification metric exploited in this paper measures the 
percentage of executed (evaluated) RT-level statements over the total when the execu- 
tion of a given test program is simulated. 

Statement coverage can be considered as a required starting point for any design 
verification process. Such analysis ensures that no part of the design missed func- 
tional test during simulation, as well as reducing simulation effort from “over- 
verification” or redundant testing. Moreover, use of coverage analysis provides an 
easy and objective way of measuring simulation effectiveness to ensure that all bugs 
would be exposed with the minimum amount of effort. Indeed, most CAD vendors 
have recently added code-coverage features to their simulators. 

Since no reliable automatic methodology exists, designers are used to write their 
own test programs to check the basic functionalities and critical corner cases. How- 
ever, devising test cases is a difficult, time-consuming task: writing a test program 
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requires a deep knowledge on the microprocessor architecture, but the designer is not 
the ideal verification engineer since he can be biased by his expectation, failing to 
check for some errors. Hand-written test cases are definitely required, but are not 
sufficient in the verification process. 

The simplest method to obtain an assembly test program is probably to compile a 
high-level routine. This approach relies on a compiler or cross-compiler, but this 
requirement may be easily met. Despite their effortlessness, compiled problem- 
specific algorithms are not the best solution. They are severely inadequate to uncover 
design errors: due to the intrinsic nature of the algorithms or of compiler strategies, 
they are seldom able to execute all statements in a description testing all functional- 
ities. 

A better strategy is to generate random assembly programs. This approach is likely 
to cover more statements in the description, but is less straightforward. A random 
generated program may easily contain illegal operations, such as division by zero, or 
endless loops. However, the effort required to generate a syntactically correct assem- 
bly source is still moderate. The main drawback of this method is that a random pro- 
gram will hardly cover all corner cases, hence resulting in low statement coverage. In 
order to obtain a sufficient coverage a large number of long programs are needed, 
resulting in overlong simulation times. 

This paper presents the Auto-Adaptive jjGP, a new approach for devising a test 
program stemming from the genetic programming (GP) paradigm. The |J,GP algo- 
rithm cultivates Turing-complete programs; internally it represents individuals as 
directed acyclic graphs (DAGs), while maps them to assembly programs during 
evaluation. The theoretical framework was presented in [4], while its first application 
was shown in [5]. 

In this paper, the pGP is significantly enhanced by adding both recombination and 
auto-adaptation mechanisms, and fitness function has been changed. The induction of 
the test case requires a syntactical description of the assembly implemented by the 
microprocessor and, exploiting the new auto-adaptive mechanisms, necessitate less 
parameter tuning. The new features both speed-up the search process and yield better 
results compared to the previous approach. 



3 Auto-adaptive pGP 

The architecture is sketched in Figure 1. The microprocessor assembly language is 
described in an instruction library, and the pGP generates efficient test programs 
exploiting it. The execution of each assembly program is simulated with an external 
tool, and the corresponding statement coverage is used to drive the optimization proc- 
ess. 

Genetic Programming (GP) was defined as a domain-independent problem-solving 
approach in which computer programs are evolved to solve, or approximately solve, 
problems [6]. GP addresses one of the more desired goals of computer science: creat- 
ing, in an automated way, computer programs able to solve problems. 

Traditional GP induced programs are mathematical functions that, after being 
evaluated, yield a specific result. The pioneering ideas of generating real (Turing 
complete) programs date back to [7]. More recently a genome compiler has been 
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proposed in [8], which transforms standard GP trees into machine code before evalua- 
tion. 

This paper exploits the versatile GP-like approach for inducing assembly programs 
presented in [4] . The methodology exploits a directed acyclic graph (DAG) for repre- 
senting ihe, flow of the program (Figure 2, left). The DAG is built with four kinds of 
nodes: prologue (followed by 1 child), epilogue (followed by no children), sequential 
instruction (followed by 1 child), and branch (followed by 2 children). 




Fig. 1. System Architecture 

• The prologue and epilogue nodes are always present and represent required opera- 
tions, such as initializations. They depend both on the processor and on the operat- 
ing environment, and they may be empty. The prologue has no parent node, while 
the epilogue has no children. These nodes may never be removed from the pro- 
gram, nor changed. 

• Sequential-instruction nodes represent common operations, such as arithmetic or 
logic ones (e.g., node B). They are always followed by exactly one child. The 
number of parameters changes from instruction to instruction, following assembly 
specification. Unconditional branches are considered sequential, since execution 
flow does not split (e.g., node D). 

• Conditional-branch nodes are translated to assembly-level conditional-branch 
instructions (e.g., node A). All common assembly languages implement some 
jump-if-condition mechanisms. All conditional branches implemented in the target 
assembly languages are included in the library. 

Each node contains a pointer inside the instruction library and, when needed, its 
parameters (i.e., operand values or register specifications). For instance, (Figure 2. 
right) shows a sequential node that will be translated into an “ORLA, Rl”, i.e., a bite- 
wise OR between accumulator and register Rl. The instruction library may contain 
different entries corresponding to the same instruction. For instance, the entry refer- 
ring to “ORL int, A”, where the parameter is the data RAM address, is different from 
the entry ""ORL A, reg” where the parameter may be @R0, @R1, RO, Rl, R2, R3, R4, 
R5, R6 or R7. 

The DAG is always translated to a syntactically correct, loop-less assembly pro- 
gram, although it is not possible to infer its semantic meaning. An induced program 
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may perform operations on any register and any memory locations, and this excep- 
tional freedom is essential to generate test programs. 





Fig. 2. DAG Representation 

The library approach enables to exploit the genetic core and the DAG structure 
with different microcontrollers or microprocessors that not only implement different 
instruction sets, but also use different formalisms and conventions. 

Adopted DAG representation prevents backward branches, either conditional or 
unconditional. This characteristic guarantees program termination, since no endless 
loop may be implemented, but introduces a small reduction in semantic power. 



3.1 Genetic Operators 

Test programs are induced by mutating the DAG topology and by mutating parame- 
ters inside DAG nodes. Both kinds of modifications are embedded in an evolutionary 
algorithm implementing a strategy. 

In more details, a population of )i individuals is cultivated, each individual repre- 
senting a test program. In each step, X new individuals are generated, parents are 
selected using tournament selection with tournament size x. Each new individual is 
generated by applying one or more genetic operators. The cumulative probability of 
applying at least n consecutive operators is equal to P(n) = p " , where parameter is 
discussed later. 

After creating new X individuals, the best p programs in the population of (|J,H-X) 
are selected for surviving. 

The initial population is generated creating |i empty programs (only prologue and 
epilogue) and then applying consecutive random mutations to each. 
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The evolution process iterates until population reaches a steady state condition, 
i.e., no improvements are recorded for generations. 

Three mutation and one crossover operators are implemented and activated with 
probability and respectively. 

• Add node: a new node is inserted into the DAG. The new node can be either a 
sequential instruction or a conditional branch. In both cases, the instruction re- 
ferred by the node is randomly chosen. If the inserted node is a branch, either un- 
conditional or conditional, one of the subsequent nodes is randomly chosen as the 
destination. Remarkably, when an unconditional branch is inserted, some nodes in 
the DAG may become unreachable (e.g., node E in Figure 2, left). 

• Remove node: an existing internal node (except prologue or epilogue) is removed 
from the DAG. If the removed node was the target of one or more branch, parents’ 
edges are updated. 

• Modify node: all parameters of an existing internal node are randomly changed. 
Parameters include immediate values and register specifications. 

• Crossover: two different programs are mated to generate a new one. First, parents 
are analyzed to detect potential cutting points, i.e., vertex in the DAG that if re- 
moved creates disjoint sub-graphs (e.g., node C in Figure 2, left). Then a standard 
1-point crossover is exploited to generate the offspring. 



3.2 Auto-adaptive Mechanisms 

The auto-adaptive pGP internally tunes both the number of consecutive random muta- 
tions and the activation probabilities of all genetic operators. Modifying these 
parameters, the algorithm is able to shape the search process significantly improving 
its performances. 

The number of consecutive random mutations is controlled by parameters p^, that, 
intuitively, molds the mutation strength in the optimization process. The popular 1/5 
rule, attributed to Rechenberg, is a well-known example of modifying the width of 
each step to increase performance. Intuitively, in the beginning it is better to adopt a 
high value, allowing offspring to strongly differ from parents. On the other hand, in 
the end of the search process, it is preferable to reduce diversity around the local op- 
timum, allowing only small mutations. The auto-adaptive pGP monitors improve- 
ments: let be the number of newly created individuals attaining a fitness value 
higher than their parents over the last H generations. At the end of each generation, 

the new value is calculated as p"''"' =a- p^ + {\- a)- . Then p^ is saturated to 

H ■ A 

0.9. And, initially, the maximum value is adopted (p^ = 0.9), considering all // (p-H^) 
individuals as improvements. Finally, the coefficient a introduces an inertia to unex- 
pected abrupt changes. 

Regarding activation probabilities, initially they are set to the same value p^^^ = p^^, 
- Pmod - Pxover ~ 0.25. Durfug evolutlou, probability values are updated similarly to 
mutation strength: let 0°’’ be the number of successful invocation of genetic operator 
OP in the last generation, i.e., the number of invocations of OP where the resulting 
individual attained a fitness value higher than its parents; and let (9, be the total num- 
ber of operators invoked in the last generation. At the end of each generation, the new 
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values are calculated as p^p = a ■ p^p + (!-«)• — — . Since it is possible that p^> 0, 

^1 

0^ may be significantly larger than X. Activation probabilities are normalized and 
forced to avoid values below .01 and over 0.9. If (?, = 0, then all activation probabili- 
ties are pushed towards initial values. 



3.3 Evaluation 

Individuals are evaluated by simulation. The DAG is first translated into a syntacti- 
cally correct assembly program and assembled to machine code. Then the execution 
of the test case is simulated on the RT-level description of microcontroller, gathering 
verification metric figures. The final value is considered as the fitness value of the 
individual, i.e., the extent to which it is able to produce offspring in the environment. 

If two programs attain the same verification metric figures, the shortest is assumed 
to have a higher fitness. 

Fitness values are used to select X parents for generating new offspring through a 
tournament of size x (i.e., x individuals are randomly selected and the fittest one is 
picked). Moreover, fitness values are used to deterministically select the best p, indi- 
viduals out of the (p-H^) ones at the end of each evolution step. 

Test program evaluation does not consider the internal structure of the microproc- 
essor, nor does it include hints for increasing the coverage based on designers’ 
knowledge. 



4 Experimental Results 



The proposed approach was tested on the 18051 microcontroller using the statement 
coverage as verification metric. 

Despite its relatively old age, the 1805 1 is one of the most popular microcontrollers 
in use today, and many derivative microcontrollers are based on, and compatible with, 
it. The 18051 is an 8-bit microprocessor originally designed in the 80’ s by Intel that 
has gained great popularity since its introduction. Its standard form includes several 
on-chip peripherals, including timers, counters, and UART’s, plus 4 Kbytes of on- 
chip program memory and 128 bytes of data memory, making single-chip implemen- 
tations possible. Its hundreds of derivatives, manufactured by several different com- 
panies include even more on-chip peripherals, such as analog-digital converters, 
pulse-width modulators, I2C bus interfaces. 



Table 1. 1805 1 RT-level description 



NAME 


MODULE 


LINES 


STMS 


CTR 


Processor core 


5,206 


2,121 


ALU 


Arithmetic Logic Unit 


429 


226 


DEC 


Decoder 


270 


220 


XRM 


External SRAM interface 


77 


11 
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The i8051 memory architecture includes 128 bytes of data memory that are acces- 
sible directly by its instructions. A 32-byte segment of this 128-byte memory block is 
bit addressable by a subset of the 1805 1 instructions, namely the bit-instructions. Ex- 
ternal data memory of up to 64 Kbytes is accessible by a special “MOVX” instruction. 
Up to 4 Kbytes of program instructions can be stored in the internal memory of the 
18051, or the 18051 can be configured to use up to 64 Kbytes of external program 
memory. The majority of the 1805 1’s instructions are executed within 12 clock cycles. 

The 18051 instructions range from 0-operand ones, like “DIV AB’’ (divide accumu- 
lator A by B) where all operands are implicit, to 3-operand ones, like the 
“CJNE Opl, Op2, RelAddr” (compare Opl with Op2 and jump if they are not equal). 
The 18051 allows 5 different addressing types: immediate, direct, indirect, external 
direct and code indirect. As in many CISC, registers are not orthogonal to the instruc- 
tions and addressing modes. 

A prototype of the proposed approach has been developed in ANSI C language in 
about 2,000 lines of code. The prototype exploits Modelsim v5.5a by Model Technol- 
ogy for simulating the design and getting coverage figures. 



Table 2. pGP parameters 



PAR 


MEANINGS 


VALUE 


h 


Population size 


5 


X 


Offspring size 


10 


X 


Tournament size (selective pressure) 


2 


'm 


Initial mutations 


100 


H 


History for auto-adaption 


4 


a 


Auto-Adaption inertia 


0.4 




Steady state 


500 



The methodology was tested on a synthesizable RT-level implementation of the 
18051 core consisting in about 7,500 VHDL lines (the corresponding gate-level netlist 
is about 12K gates). An external data RAM of 2 Kbytes was connected to the 18051, 
while no external program memory was used. 

Four main blocks can be found in the RT-level description (Table 1): the processor 
core control logic (CTR), the arithmetic and logic unit (ALU), the instruction decoder 
(DEC), and the external SRAM interface (XRM). The CTR is described behaviorally 
as a sequential logic block; the ALU is described behaviorally as a combinational 
logic block; the DEC is described as a data-flow implementing a combinational logic 
block; the XRM models an external SRAM. For each block Table 1 reports the total 
number of VHDL lines [LINES] and the number of statements [STMS]. 

Inducing a test program with the proposed GP required about 1,200 generations, 
corresponding to the evaluation of about 12,000 programs. The experiments em- 
ployed about 12 hours of CPU time on a SPARC ULTRA Workstation at 400MHz 
with 2GB of RAM. Table 2 shows the adopted parameters. 

To assess the efficiency of the proposed method, the induced test program was 
compared with 5 programs devised with 3 different methodologies: compiled prob- 
lem-specific algorithms; random test programs; and exhaustive functional test case. 
Table 3 compares the different programs in term of required program ROM bytes 
[SIZE] and instructions executed by the program [INST]. Statement coverage figures 
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are reported in column [TOT]. Statistics are also detailed for the 4 blocks [ALU], 
[CTR], [DEC] and [XRM], 



Table 3. Experimental Results 



Test program 


Pro 


gram 


Statement Coverage [%] 


SIZE 


INST 


ALU 


CTR 


DEC 


XRM 


TOT 


Fibonacci 


324 


1,176 


49.60 


30.20 


62.70 


81.80 


34.70 


int2bin 


81 


572 


49.60 


21.30 


56.40 


81.80 


27.10 


Random (size = GP) 


648 


334 


86.70 


87.60 


93.20 


100.00 


88.20 


Random (size = 4K) 


4,096 


2,373 


96.50 


94.60 


97.70 


100.00 


95.00 


TestAll (exhaustive) 


2,834 


52,953 


95.10 


99.40 


100.00 


100.00 


99.10 


ATS2002 


469 


228 


99.60 


99.70 


100.00 


100.00 


99.70 


Auto-adaptive rGP 


292 


183 


100.00 


99.80 


100.00 


100.00 


99.84 





Fig. 3. Activation probabilities 

The two problem-specific algorithms are Fibonacci and int2bin. The former calcu- 
lates the Fibonacci series, while the latter converts an integer to a binary representa- 
tion. As expected, the coverage figure is quite low. Looking at the decoder, it may be 
easily inferred that only a subset of the instruction set is used. Both programs execute 
loops (the number of executed instructions is higher than the number of stored ones). 
Neither program accesses the external data RAM. 

Two different random test programs were considered. The former. Random 
(size = GP), was devised using the same effort as the GP. 20,000 random programs of 
approximately 500 instructions were generated and the best one was chosen. The 
comparison allows evaluating the effectiveness of the evolutionary core in driving the 
search process. The latter, Random (size = 4K), was devised generating 20,000 ran- 
dom programs filling all available ROM space and selecting the best one. It is in- 
cluded here to allow an estimation of the best result attainable with the random ap- 
proach disregarding efficiency. 

The exhaustive functional test program TestAll is also considered. The test case 
was devised by microprocessor designers; it is relatively long and includes several 
loops. It tests all possible instructions, although it is not able to check all possible 
corner cases in the implementation. For instance, “DIV AB” when A is less or equal 
than B was not taken into account. 
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Fig. 4. Mutation strength 
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Finally, results attained by the basic |tGP [5] are shown in ATS2002. 

The induced test program got the highest statement coverage figure with the small- 
est size and the lowest run time. Execution of the test case is fast, since there are no 
loops by construction. Compared with original |J,GP, the auto-adaptive algorithm is 
able to produce better and shorter test programs, with even reduced human effort. 

Figure 3 shows the evolution of activation probabilities during the experiment, the 
adapted mutation strength is shown in Figure 4. Parameter values are plotted against 
the generation number. 

It may be surprising the high activation probability of the Delete operator. How- 
ever, the initial population is generated applying consecutive random mutations to 
empty programs and the average length of the initial programs is about 60 instruc- 
tions. In this situation, it is useful for the |tGP to delete instructions. Interestingly, 
experimental evidence suggests that when the |J,GP starts with a population of empty 
programs = 0) in the first hundreds of generations the increases dramatically, 
but after this first phase the trend comes close to the one showed in Figure 3. 

The mutation strength adaptation is more predictable: in the beginning is about 

= 0.6 and then slowly decrease to 0. The last 300 generations are plotted even if the 
|lGP already reached the steady-state condition to show activation probabilities reach- 
ing back their default values. 

Performances of the Auto- Adaptive |tGP are comparable with those attained by the 
standard |J,GP; however, results are significantly better. This is due to the fitness that 
takes into account program length and to the adaptive activation probabilities. It may 
be interesting to show how the simple mutation strength p , influences performances. 

Figure 5 compares performance of the standard jiGP against the auto-adaptive one, 
where only parameter p^ is tuned. The statement coverage is plotted against the gen- 
eration number, only the first 800 generations are considered. Graph shows clearly 
that the enhanced jiGP is faster and able to reach a higher maximum. 
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Fig. 5. Mutation strength effectiveness 



5 Conclusions 



This paper presents the Auto-Adaptive /iGP, a methodology able to automatically 
induce an assembly test program for a microcontroller. The methodology is based on 
an auto-adaptive evolutionary algorithm and exploits the syntactical description of the 
language. 

Unless commonly adopted methodology, it does not require a skilled programmer 
or additional analysis to be performed by verification engineers. The generated test 
case is able to efficiently maximize a given verification metric, and, at the end of the 
process, designers are required to manually analyze only those parts of the description 
that the tool failed to validate. 

A prototype of the proposed approach has been developed in ANSI C language, 
and then it was tested on a synthesizable RT-level implementation of the i8051 using 
statement coverage as verification metric. Induced test programs outperformed test 
cases devised with alternative methodologies. 

Experimental results show the efficiency of the methodology. Devising a test case 
able to reach the complete statement coverage is a difficult task, even on a small mi- 
crocontroller like the i805 1 . Reported data show that random programs can hardly test 
all corner cases and also long, carefully designed hand-made test cases may not be 
exhaustive. The automatically induced test program, conversely, was able to cover 
almost all (99.84%) of the statements in the description. 

The algorithm is able to tune internal parameters to optimize search process. New 
auto-adaptive mechanisms dramatically enhance both performances and quality of the 
results, as a result, the proposed approach betters the original pGP generating a test 
program that is both shorter and able to reach higher coverage. 

Current work is targeted to apply the proposed approach to more complex proces- 
sors. 
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Abstract. This paper proposes a coprocessor architecture to speed up 
hardware evolution. It is designed to be implemented in an FPGA with 
an integrated microprocessor core. The coprocessor resides in the con- 
figurable logic, it can execute common genetic operators like crossover 
and mutation with a targeted data throughput of 420 MByte/s. Together 
with the microprocessor core, a complex evolutionary algorithm can be 
developed in software, but is processed at the speed of dedicated hard- 
ware. 



1 Introduction 

An evolvable hardware platform usually consists of three main parts: the recon- 
figurable hardware (RH) and two processing units: the fitness calculation unit 
(FCU) and the evolutionary algorithm unit (EAU). There are different possibili- 
ties to implement the processing units (FCU and EAU) like PC-based approaches 
(Q, 0), DSPs 0, FPGAs 0 and ASICs. 

These implementations can be classified whether they are software- (PC and 
DSP) or hardware-oriented (FPGA and ASIC). While the first option provides 
the advantage of variability, easier design and maintainability, it is relatively 
slow. A hardware implementation allows a higher degree of parallelism and is 
therefore faster, but usually needs more design effort. 

Our group uses an analog neural network ASIC (ANN) 0 as the reconfigur- 
able architecture (see Fig. 0). Its configuration stream consists of about 45 kByte 
of data and it calculates one network layer every 20 ns. To exploit the network’s 
speed, substantial computing power and data-flow rate are needed to generate 
new generations in time. 

In this paper we propose a coprocessor architecture to bring up the evolu- 
tionary algorithm to the speed of dedicated hardware. It is customized to suit 
the demands of our ANN, but the architecture is flexible enough to be trans- 
ferred to other reconfigurable hardware systems. The coprocessor is designed for 
a Virtex-II Pro FPGA manufactured by Xilinx 0. This FPGA provides pro- 
grammable logic cells, high speed serial links for off-chip communication and an 
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Fig. 1. Setup used for the evolvable hardware experiments 



IBM PowerPC 405 core immersed in the FPGA fabric. With this device we are 
able to partition between soft- and hardware: 

The evolutionary algorithm is written in software and executed in the micro- 
processor core. The coprocessor resides in the programmable logic cells and is 
coded in a hardware description language. The problem-specific fitness calcula- 
tion is not a topic in this paper. It is done either by software in the microprocessor 
core or in specialized hardware in the FPGA. 

The instruction set and the internal structure of the coprocessor are designed 
to allow a wide range of evolutionary operators. This makes it possible to use the 
same hardware configuration code for a large number of different experiments 
and training algorithms. 

With this coprocessor system it is possible to design complex evolutionary 
algorithms in a software environment, but let the time-expensive parts be ex- 
ecuted in hardware. This yields a data throughput of up to 420 MByte/s of 
genetic data. 

2 Evolution System 

Our hardware integrates a Xilinx FPGA Virtex-II Pro 0 together with 
256 MByte of DDR-SDRAM 0 and an analog neural network ASIG on a single 
PGbQ as shown in Fig. Q It is interfaced to a host computer. This connection is 
used only to initialize the microprocessor core, the random generators, etc. and 
to monitor the evolution progress. 



Analog Neural Network ASIC. The target of the evolution is a neural 
network. It is implemented in a mixed-signal ASIG developed by our group, 
partitioned in four network blocks containing 128 input neurons and 64 output 
neurons each. This sums up to 32768 synapses. Taking into account the resolution 
of 11 bit per synapse, our configuration bit stream, i.e. the genome, is 360448 
bit (44 kByte) long (see [0). 



^ Printed Circuit Board (PCB) 
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Fig. 2. Evolution system 



This ASIC is connected to the FPGA by a parallel LVDS0 link, capable of 
transferring up to 1.2 GByte/s of configuration or testpattern data. To evaluate 
an individual or genome, the configuration data has to be transferred to the ASIC 
first. After this is done, the testpatterns are applied successively and finally the 
results for each testpattern are sent back to the FPGA across the LVDS link. 



Xilinx FPGA Virtex-II Pro. The coprocessor is designed for the Virtex-II 
Pro FPGA. Apart from the programmable logic, it contains 504 kBit internal 
dual-ported SRAM, hardware multiplier and high speed serial links for off-chip 
communication. Its outstanding feature is the PowerPC, a microprocessor core 
delivering 420 Dhrystone MIPS. As the PowerPC is immersed in the FPGA 
fabric, it can access part of the internal SRAM as fast as its own cache. Our 
coprocessor connects to the additional SRAM port, i.e. we use the dual ported 
internal SRAM to communicate between microprocessor core and coprocessor. 



Peripherals and Scalability. The internal SRAM provided by the FPGA is 
neither sufficient to store the genome data nor the testpatterns. Therefore, we 
added a DDR-SDRAM module to the PCB for the FPGA to use up to 1 GByte 
of external memory at a transfer rate of up to 2.1 GByte/s. 

The evolution system is designed in a way that allows to combine 16 systems 
on one backplane. These modules are connected to each other with the serial 
links described above. Each FPGA offers four links, we are targeting a 16 node, 
2-dim torus, with each connection transferring up to 3.125 Gbit/s. 



3 Evolutionary Algorithm Coprocessor (EAC) 

3.1 Overview 

The EAG pursues two aims: It must process a high data throughput and it has 
to be parameterized to allow a wide range of evolutionary algorithms without 
changing the hardware configuration. The first aim demands a pipelined struc- 
ture. To satisfy the second this pipeline is controlled and managed by a set of 



^ Low- Voltage Differential Signalling (LVDS) 
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Fig. 3. Instruction buffer and command structure 



instructions. This structure is shown in Fig. 01 The microprocessor core addresses 
the EAC by writing instructions to the Instruction Buffer (IB). They are read 
and decoded successively. Depending on the instruction they control different 
parts of the pipeline, indicated with dashed arrows in Fig. 01 



3.2 Data Flow Control 

All genetic data is stored in the SDRAM. A population consists of a number of 
individuals, represented by their genomes. Each genome is divided into chromo- 
somes. A chromosome is stored consecutively in the SDRAM and crossover takes 
place inside one chromosome. Each chromosome contains an arbitrary number 
of genes, the smallest unit of the genome. The mutation operator works on single 
genes. A typical ANN training setup uses 10-100 individuals and one chromo- 
some per neuron. The coprocessor can handle any combination that fits into 
memory. 

A more detailed view of the pipeline and the surrounding control structures 
is given in Fig. 0 The genes are read from the SDRAM as parents, modified in 
one crossover and two mutation stages and written back as children to a new 
address in the SDRAM. 

In order to feed genetic data into the pipeline (or to write it back to the mem- 
ory), the software must specify the SDRAM address and the number of genes 
to be transferred. The SDRAM works burst-oriented, i.e. consecutive addresses 
can be accessed very fast while random access is relatively slow. Therefore, the 
chromosomes are stored consecutively in the SDRAM, and FIFOs accumulate 
the data in order to address the SDRAM in bursts. 



Parallelism. A gene is expressed with 11 bit resolution. As described below, we 
need two additional bits to control the evolutionary operators. To simplify the 
addressing, each gene occupies two bytes, so three bits are available for future 



use. 
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Fig. 4. The pipeline with its control structnre 



The design is targeted for 80 MHz. Four pipelines work in parallel to take 
advantage of the SDRAM transfer rate. Fig. ^depicts one pipeline processing 
one gene per cycle. Provided, enough genetic data and instructions are supplied, 
the four pipelines can process four genes per cycle. 

3.3 Combinatorial Logic 

Fig. 0 shows a detailed description of the evolution pipeline. There are two 
different kinds of control options: 

— Multiplexer Control Bits 

— Global Parameters 

The Multiplexer Control Bits (SelectParent, SelectRandom, SelectScale, 
SelectConstant, SelectReplace and SelectMutate) in Fig. 0are numbered 
from one to six, they may change with every new gene. For example, 
SelectMutate decides whether the original gene is passed down unaltered or 
undergoes mutation. As this is done for each gene independently, SelectMutate 
changes almost with every new gene. 

The Global Parameters (RandomMaximum, Constant and Fraction) are rec- 
ognizable by dashed boxes. They are at least valid for a whole chromosome, in 
most cases for the entire evolution. RcindomMaximum for example may be set to 
1023 — 1) at the beginning to get mutated genes with every possible value. 

Later, as the fitness increases, RandomMaximum might be decreased to fine-tune 
the population. 

This difference in the update frequency is important. The Global Parameters 
are updated infrequently, therefore, the software can set them with instructions. 
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Fig. 5. The pipeline in detail 



Since the Multiplexer Control Bits may be updated every cycle, they must be 
generated faster to achieve a substantial data throughput. 



Multiplexer Control Bits (MCB). With the setting of the MCBs, one can 

1. choose from which parent the child inherits the gene (SelectParent). 

2. choose between a uniform or gaussian random distribution (SelectRcUidom). 

3. scale the new gene value (SelectScale). 

4. choose between a random or a constant new gene (SelectConstant). 

5. choose whether the generated gene shall be added to the original or replace 
it (SelectReplace). 

6. decide whether a mutation occurs or not (SelectMutate). 

As stated above, the MCBs may change their values with every new gene, i.e. 
every new cycle. Thus they cannot be set individually by software instructions 
since this would be too slow. On the other hand, the MCBs actually define the 
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Fig. 6. A LUT sets the MCBs 



underlying evolutionary algorithm. Still the description of the algorithm must 
be based in software to be variable. 

This contradiction is solved by introducing an additional set of bits, the 
Evolution Control Bits (ECB). These ECBs generate the MCBs via a look-up 
table (LUT): The LUT address is given by the ECBs, the LUT entries are written 
by software, and the LUT outputs equals the MCBs (Fig. 0. There are three 
kinds of ECBs: 

— Instruction dependent bits 

Two Evolution Control Bits, Maskl and Mask2 are set by instructions. For 
example they can be used to set crossover points: To transfer the necessary 
data for a two-point crossover with crossings after the 27th and 78th bit for a 
chromosome with 128 genes, the run-length-coded mask would be (27,51,50). 

— Random bits 

Two Evolution Control Bits, Randl and Rand2, are generated by a random 
generator each. They are used to decide whether a mutation occurs or not. 
Their output equals ‘1’ with a probability given by an instruction. 

— Gene dependent bits 

Four Evolution Control Bits, Mother 1, Mother 2, Father 1 and Father 2 are 
obtained from the parent genes. They can be used to steer the evolution, 
for example, to exclude a single gene from mutation if Mother! and Fatherl 
equals ‘O’. As stated above, our genes are 11 bit long plus two Evolution 
Control Bits for each gene. 

Now each of the six Multiplexer Control Bits depends on the eight Evolution 
Control Bits. The exact dependency is controlled by the software. We are using 
a look-up table (LUT) based structure, where the LUT address is given by the 
Evolution Control Bits. 

For each Multiplexer Control Bit one 8 bit LUT is needed. The ECBs form 
the input or address to that LUT, while the software specifies the LUT entries. 
Note that these entries are generally only written once, but can be changed if 
the necessity arises. 

This is best understood with an example: A mutation algorithm shall use two 
different mutation rates, 2% and 5%. There are genes with the high mutation 
rate, genes subject to the low mutation rate and genes without any mutation at 
all. The Motherl and Fatherl bits are used to differentiate between the three 
cases. A gene shall be mutated with 5% probability if both fatherl and motherl 
bits are set to ‘1’, but with a probability of 2% if either the father’s bit or the 
mother’s bit is set to ‘1’. No mutation shall occur if both bits are set to ‘O’. 
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The LUT implementing this behavior is shown in Tab. n (This example uses 
only a Tbit, instead of an 8-bit LUT as described in the text. The remaining 
four Evolution Control Bits (Mother2, Father2, Maskl and Mask2) are treated 
as don’t cares for this setup). 



Table 1. Example LUT with four LUT-Address-Bits. An ‘X’ in the table denotes 
‘don’t care’. To obtain the full LUT, all rows with an ‘X’ must be duplicated and ‘O’, 
resp. ‘1’ has to be inserted instead of the ‘X’ 



Father 

Bitl 


Mother 

Bitl 


Randl 

5% 


Rand2 

2% 


Do 

Mutate 


Explanation 


0 


0 


X 


X 


0 


Both gene bits ‘O’: no mutation, inde- 
pendent of the state of the rand bits 


1 


0 


X 


0 


0 


Exactly one gene bit ‘1’: 


1 


0 


X 


1 


1 


mutation occurs if Rand2 (2%) 


0 


1 


X 


0 


0 


equals ‘1’, i.e. with a 


0 


1 


X 


1 


1 


probability of 2% 


1 


1 


0 


X 


0 


Both gene bits ‘1’: mutation 


1 


1 


1 


X 


1 


occurs if Randl (5%) equals ‘1’ 



3.4 Memory Organization and Instruction Set 

Memory Organization. Our system offers two kinds of memory, FPGA inter- 
nal SRAM (504 kBit) and off-chip DDR-SDRAM (up to 1 GByte). The first is 
used to communicate between processor core and Goprocessor. It is dual ported 
and can be randomly accessed independently from both ports. The genetic data, 
i.e. the genomes, are stored in the off-chip SDRAM, they are far too big to fit 
into the FPGA. 

The IB is 32-bit wide. It holds the instructions and additional data and 
addresses, which are used for indirect addressing, as described below. 



Instruction Set. The IB does not only hold the instructions, but can also con- 
tain data and addresses. Some instructions do not transfer all their information 
directly. To transfer the LUT entries, the microprocessor core first writes them 
into the data section of the IB. After that, it issues the instruction 
SetLUTSelectParent (IB address). Now the coprocessor looks in the IB at 
the transmitted address for the data. This indirect addressing is done for two 
reasons: 

First, some instructions must transfer more than 32 bit of data to the co- 
processor. To keep a consistent instruction length of 32 bit, this data has to be 
transferred indirectly. 

Second, using indirect addressing enables the microprocessor core to gener- 
ate an address table in the IB. Depending on how many memory locations for 
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Table 2. Coprocessor instruction set 



Instruction 


Parameters 


Representation 


SetLUTSelectParent 

SetLUTSelect Random 

SetLUTSelectScale 

SetLUTSelectConst 

SetLUTSelectReplace 

SetLUTSelectMutate 


(IB address) 
(IB address) 
(IB address) 
(IB address) 
(IB address) 
(IB address) 


Instruction identifier bit(0-7) 
Instruction Buffer Address: 
bit(8-18) 


SetRandomMaximum 
SetFraction 
SetConstant 
SetOneProbability 1 / 2 


(RandomMaximum) 
(Fraction) 
(Constant) 
(probability! /2) 


Instruction identifier bit(0-7) 

Value: 

bit(8-23) 


MaskIndirectBin 

MaskIndirectRunLength 


(IB adr, words to read) 
(IB adr, words to read) 


Instruction identifier bit(0-7) 
Instruction Buffer Address: 
bit(8-18) 

words to read: bit (19-31) 


MaskDirectRunLengthO 

MaskDirectRunLengthl 


(quantity of zeros) 
(quantity of ones) 


Instruction identifier bit(0-7) 
quantity of ones/zeros: 
bit (8-23) 


ReadParentl 

ReadParent2 

WriteChild 


(IB adr) 
(IB adr) 
(IB adr) 


Instruction identifier bit(0-7) 
Instruction Buffer Address 
bit(8-18) 



chromosomes the algorithm needs, SDRAM addresses are written into the IB, 
followed by the length (i.e. the number of genes) per chromosome. To feed a 
sequence of genetic data into the pipeline during the evolution, the microproces- 
sor core only transmits the corresponding IB addresses. The coprocessor then 
reads the SDRAM address and the number of genes to transfer from the IB. This 
simplifies the address management and reduces address transmitting overhead. 
The instruction set can be divided into two parts (see Tab. 0: 

The first group of instructions must be given once to initialize the coprocessor 
with valid data. The LUT entries are written with an indirect addressing scheme. 
The Global Parameters must be specified directly. Both may change infrequently 
during the evolution. 

The second group of instructions is given once for each chromosome. A mask 
must be given to define the crossover point (s). A mask can be defined directly 
or indirectly, using a run-length-code or by transferring all mask bits in binary. 

Finally, the data flow control must be told where to get the parents, where 
to store the child and how many genes to transfer. This is done indirectly. As 
mentioned above, an address table in the IB is created. All data flow control 
instructions refer to that table. 

This instruction set provides a wide range of possibilities for the software to 
implement various evolutionary algorithms. For example, the initial population 
can be generated just by setting the mutation rate to 100%, crossover opera- 
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tors using 3 or even 4 parent can be implemented with two (or three) 2-parent 
crossover operators in succession. It is also possible to just add a fraction or a 
constant value to each gene. 

3.5 Performance Analysis 

To estimate the performance of our evolution system we must consider not only 
the pipeline data throughput but the SDRAM transfer-rate and the micropro- 
cessor core performance as well. 

— The pipeline is able to process four 11-bit-genes per cycle. With the targeted 
FPGA frequency of 80 MHz this sums up to 320 million genes or 420 MByte 
per second. 

— Each gene must be read twice and written once, leading to an SDRAM 
transfer rate of approximately 1.26 GByte/s. This is 59% of the maximum 
transfer rate provided by the DDR-SDRAM. 

— A chromosome on the average needs four instructions to be fed into the 
pipeline0 Gonsidering the microprocessor core frequency of 300 MHz, a typ- 
ical chromosome length of one chromosome per neuron (176 Byte for our 
ANN) and the pipeline data throughput of 420 MByte/s, the microprocessor 
core has around 125 cycles to compute and issue the four instructions men- 
tioned above to process a chromosome and to keep pace with the pipeline. 

This rough estimation shows that our system is able to run at the calculated 
speed of the pipeline. 

4 Example: Crossover and Simple Mutation 

This section presents an example of how to instruct the coprocessor to perform a 
crossover followed by a simple mutation. The software sets the crossover points 
directly and demands a mutation rate of 5%. 

Tab. Qshows the complete LUT, which is quite simple: All four parent bits, 
the MaskBit2 and the RaindomBit2 are not used. The SelectParent equals 
MaskBitl and the SelectMutate equals RandBitl. All other MGBs are con- 
stant, SelectRandom is ‘0’ to choose a uniform distribution, SelectScale is ‘T 
to scale with RandomMaximum, SelectConstcint is ‘0’ to use a randomly chosen 
new gene, SelectReplace is ‘1’, i.e. in case of a mutation we replace the original 
gene, instead of adding something to it. 



Global Parameters. Our mutated genes are completely random with mutation 
rate 5%, they cover all 11 bit, therefore we set the instructions: 

SetRandomMaximum (1023); 

SetOneProbabilityl (51); 

^ These four instructions are: ReadParentl, ReadParent2, WriteChild and one mask 
instruction. 
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Table 3. Example LUT with four LUT-Address-Bits (X : don’t care) 



Mask Mask Rand Rand Mother Father Select Select Select Select Select Select 
Bitl Bit2 Bitl Bit2 Bitl/2 Bitl/2 Parent Rand Scale Const Replace Mutate 



0 


X 


0 


X 


X 


X 


0 


1 


0 


1 


0 


0 


1 


X 


0 


X 


X 


X 


1 


1 


0 


1 


0 


0 


0 


X 


1 


X 


X 


X 


0 


1 


0 


1 


0 


1 


1 


X 


1 


X 


X 


X 


1 


1 


0 


1 


0 


1 



Note that the remaining Global Parameters are not used in this setup and 
that the mutation probability is not expressed in per cent, but in per 1024 
(51 / 1024 ~ 0.05). 

Specifying the Crossover Mask. The SelectParent LUT is programmed 
to let the MaskBitl decide from which parent the child inherits its gene. This 
MaskBitl is set with: 

for (ChromoCount=0 ; ChromoCount<MaxChromo ; ChromoCount++) { 
MyRand = Random (1 28 ) ; // between [1..128] 
MaskDirectRunLengthO(MyRand) ; 

MaskDirectRunLengthl (128-MyRand) ; } 

Each loop cycle will take (MyRand) genes from the first parent and (128-MyRand) 
genes from the second. 



Data Flow Commands. Prior to issuing any data flow commands, the software 
has to write an address table into the IB. With this table, all chromosomes are 
addressable by pointers. To cross, for example, the fittest with the second-fittest 
genome, we need to issue the instructions: 

ReadParentl (IB_adr = Genome [Best] ) ; 

ReadParent2(IB_adr = Genome [Second] ) ; 

WriteChild (IB_adr = Genome [Unusedl] ) ; 

Note that the pointers Best, Second, etc. in general change their value with 
every new generation. 

A complete evolution starts with the creation of an initial population (by 
setting the mutation rate to 100%). This population is evaluated and ranked ac- 
cording to the fitness. Based on this ranking, the software decides which genomes 
are crossed and issues the appropriate data flow and mask instructions, as shown 
in the examples above. After all genomes have been crossed and mutated, the 
evaluation starts again. This circle is continued until a satisfactory fitness is 
reached. 




Speeding up Hardware Evolution 285 



5 Conclusion and Outlook 

We have introduced a coprocessor architecture to speed up evolutionary algo- 
rithms designed for the FPGA Virtex-II Pro. The coprocessor is able to perform 
common genetic operators like crossover and mutation with a data throughput of 
up to 420 MByte/s due to pipelining and parallelism. It still has the flexibility of 
software. Therefore, wide range of evolutionary algorithms can be designed and 
maintained in software, but are processed at high speed in dedicated hardware. 

In the future we plan to implement additional operators, e.g. the possibility 
to add two genes together during the crossover. Also, we are going to adapt 
the coprocessor to another reconfigurable hardware: The field programmable 
transistor array (FPTA) developed by our group (see [EJ). 

We also want to connect several of the proposed systems using the high 
speed serial links offered by the FPGA. This distributed, autonomous hardware 
evolution system will give us the resources to tackle large scale optimization 
problems. 
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Abstract. In this paper we describe the hardware evolution of analog circuits 
performing signal separation tasks using JPL’s Stand-Alone Board-Level Ev- 
olvable System (SABLES). SABLES integrates a Eield Programmable Transis- 
tor Array chip (FPTA-2) and a Digital Signal Processor (DSP) implementing 
the Evolutionary Platform (EP). The FPTA-2 is a second generation reconfigur- 
able mixed signal array chip whose cells can he programmed at the transistor 
level. Its chip architecture consists of an 8x8 matrix of reconfigurable cells. The 
FPTA-2 is reconfigured by evolution to achieve circuits that can extract a target 
signal that is combined with an undesired component or to perform the separa- 
tion of a combination of two signals. The paper considers also an adaptive filter 
where the fitness function depends on the input signal. The results demonstrate 
that SABLES is not only able to perform signal separation and extraction, but it 
is also flexible enough to adapt to different input signals without human inter- 
vention, such as in the case of self-tuning and adaptive filters. 



1 Introduction 

This work describes the evolution of circuits that perform signal separation using the 
most recent version of the Field Programmable Transistor Array chip, the FPTA-2. 
Source or signal separation is a classic signal processing problem [1]: given A physi- 
cally distinct measurements which represent a priori unknown linear combinations of 
N independent signal sources, the objective is to auto-adaptively extract N equivalent 
statistically independent signals, e.g., arising from independent physical sources. One 
approach to solve this problem is the H-J adaptive network, which is based on the 
modified Hebbian learning rule [2]. Cohen and Andreou [1] proposed a hardware im- 
plementation for a Neural Network implementing the H-J adaptive algorithm and 
tested it for the case of separating two sine waves (782Hz and IkHz) applied at the 
network input, after being mixed. The interest in the signal separation problem is mo- 
tivated by applications in communication systems, such as adaptive noise reduction 
and adaptive echo cancellation. 

Particularly, our focus is to demonstrate the synthesis of circuits that can perform 
signal extraction and separation using the SABLES platform. This class of applica- 
tions encompasses experiments in which the reconfigurable hardware evolves to a 
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new configuration when the input signal characteristic changes, leading to potential 
applications such as adaptive noise elimination. Additionally, this paper also de- 
scribes an experiment accomplishing the separation of two source signals from a mix- 
ture of these signals given at the circuit input. 

SABLES was used in the evolutionary experiments. This system solution provides 
autonomous, fast (about 1,000 circuit evaluations per second), on-chip circuit recon- 
figuration. Its main components are a JPL Field Programmable Transistor Array chip 
as transistor-level reconfigurable hardware, and a TI DSP implementing the evolu- 
tionary algorithm as the controller for reconfiguration. 

This article is organized as follows. Section 2 describes the SABLES and the 
FPTA-2 chip. Section 3 describes the experiments, including the evolution of tunable 
filters excited by an “unknown” combination of source signals; and the evolution of a 
circuits performing signal separation. Section 4 summarizes the conclusions and fu- 
ture applications. 



2 SABLES 

SABLES integrates an FPTA and a DSP implementing the Evolutionary Platform 
(EP) as shown in Figure 1. The system is stand-alone and is connected to the PC only 
for the purpose of receiving specifications and communicating back the results of 
evolution for analysis [3]. 




Fig. 1. Block diagram of a simple stand-alone evolvable system. 

The FPTA is an implementation of an evolution-oriented reconfigurable architec- 
ture (EORA) [4]. The lack of evolution-oriented devices, in particular for analog, has 
been an important stumbling block for researchers attempting evolution in intrinsic 
mode (with evaluation directly in hardware). Extrinsic evolution (using simulated 
models) is slow and scales badly when performed accurately e.g. in SPICE, and less 
accurate models may lead to solutions that behave differently in hardware than in 
software simulations. The FPTA has transistor level reconfigurability, supports any 
arrangement of programming bits without danger of damage to the chip (as is the case 
with some commercial devices). Three generations of FPTA chips have been built and 
used in evolutionary experiments. The latest chip, FPTA-2, consists of an 8x8 array of 
reconfigurable cells. Each cell has a transistor array as well as a set of other pro- 
grammable resources, including programmable resistors and static capacitors. Figure 
2 provides a broad view of the chip architecture together with a detailed view of the 
reconfigurable transistor array cell. The reconfigurable circuitry consists of 14 transis- 
tors connected through 44 switches and is able to implement different building blocks 
for analog processing, such as two- and three-stage OpAmps, logarithmic photo 
detectors, or Gaussian computational circuits. It includes three capacitors, Cml, Cm2 
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and Cc, of lOOfF, lOOfF and 5pF respectively. Details of the FPTA-2 can be found in 
[5]. Other implementations of reconfigurable analog devices can be found elsewhere 
[ 6 - 10 ], 

The evolutionary algorithm was implemented in a DSP that directly controlled the 
FPTA-2, together forming a board-level evolvable system with fast internal commu- 
nication ensured by a 32-bit bus operating at 7.5MHz. Details of the EP were pre- 
sented in [11]. Over four orders of magnitude speed-up of evolution was obtained on 
the FPTA-2 chip compared to SPICE simulations on a Pentium processor (this per- 
formance figure was obtained for a circuit with approximately 100 transistors; the 
speed-up advantage increases with the size of the circuit). The evaluation time de- 
pends on the tests performed on the circuit. Many of the evaluation tests performed 
required less than two milliseconds per individual, which for example on a population 
of 100 individuals running for 200 generations required only 20 seconds. The bottle- 
neck is now related to the complexity of the circuit and its intrinsic response time. 
SABLES fits in a box 8” x 8” x 3”. 




Fig. 2. FPTA 2 architecture in the left and schematic of cell transistor array (re-configurable 
circuitry) in the right. The cel contains additional capacitors and programmable resistors (not 
shown). 



3 Experiments 

The results for three different classes of experiments are presented in this section. The 
first two experiments demonstrate the evolution of circuits for signal extraction, while 
the last experiment shows the evolution of a circuit for signal separation. In the first 
case the evolving circuit is excited with combination of signal sources of known fre- 
quencies f| and fj, e.g., the known input frequencies are used by the fitness evaluation 
function. The objective is to amplify the signal at frequency f^ while attenuating the 
signal at frequency In the second case the circuit is excited with a combination of 
signal sources of (\it unknown frequencies, e.g., knowledge of the input sine waves 
frequencies is not used by the fitness evaluation function. The objective of the ex- 
periment is to evolve a circuit that amplify the strongest component of the input signal 
and attenuate the weakest one, therefore improving a hypothetical signal/noise ratio. It 
is shown that when the frequency of the strong and/or weak component in the input 
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signal changes, the circuit is reconfigured by evolution to “adapt” to the new input 
profile. Finally, in the third experiment evolution performs signal separation: the 
evolving circuit is excited with two linear combinations of pure sine waves (source 
signals) of known frequencies, and the circuit outputs are the original source signals. 

Based on prior experimentation, the GA parameters selected for these experiments 
were: 70% mutation rate; 20% crossover rate; replacement factor of 20%; population 
of 400; and 100 to 200 generations. A binary representation was used, where each bit 
determines the state (opened, closed) of a switch. Each execution took about 5 min- 
utes in the SABLES system. More than 20 different GA executions were performed. 
In order to compute the fitness function, the LET of the output signal(s) from the 
EPTA-2 was calculated. The fitness was a measure of the error of the EPTA-2 outputs 
to the target values in each experiment, as shown below. 

N 

Fitness = '^\Oj: (i) - Tj (i) I (1) 

i=l 

where N is the number of samples used in the EFT (usually 64), 0^(1) is the magni- 
tude of the i“’ FFT component of the FPTA-2 output, and T/i) is the target magnitude 
of the i“’ FFT component. Other fitness measures such as the sum of the squared de- 
viations between the output and the target were tried, without significant improve- 
ment. 

Referring to the input signals frequencies, they were restricted to values below 
50kHz, because the data converters at SABLES operate at lOOkHz sampling rate and 
due to bandwidth limitations of reconfigurable analog devices [10]. 

Another important feature of these experiments refers to the switches control volt- 
ages. The switches of the re-configurable chip are implemented as transmission gates. 
The control voltages that completely open or close the switches are 0 and 2V. How- 
ever, through experimentation, it has been observed that the results significantly im- 
prove when the values 0.4V and 1.6V are used to control the switches, meaning that 
they are now partly opened and closed (partly closed if the higher and lower control 
voltages are respectively applied to the NMOS and PMOS transistors of the switch, 
and partly opened in the other case). 

Einally, another important issue is the search space size. If we allow a completely 
unconstrained evolution, we will end up with a very large search space size. One ap- 
proach to reduce the search space size is to have the EPTA-2 cells constrained to a 
particular topology, so that only the interconnections among the cells and the feed- 
back resistance values [5] are evolved. Through experimentation, it has been verified 
that the constrained approach delivered better results. In the following experiments 
the cell topologies were fixed to the one of inverting amplifiers. 



3.1 Combination of Known Signal Sources 

To evolve the circuit we introduce an input signal consisting of a linear combination 
of lOkHz and 25kHz sine waves. The objective is to eliminate the 25kHz tone and 
amplify the lOkHz tone. Lour cells of the EPTA-2 were used in this experiment. Eig- 
ure 3 provides a high level schematic of the input/output interconnections in this ex- 
periment. Eigure 4 depicts the EFT of the input signal and of the evolved circuit out- 
put. Figure 5 shows the evolved circuit response in the time domain, when excited by 
lOkHz and 25kHz tones respectively. 
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Fig. 3. Input (In)/Output(Out) connection in the signal extraction experiment. Only the four 
cells at the top-left of the array were used. 




Frequency (Hz) 



Frequency (Hz) 



Fig. 4. Evolved circuit response in the frequency domain: EFT of the input (left); and output 
(right) signals. 




Fig. 5. Evolved circuit response in the time domain: when excited by a lOkHz input signal 
(left) and by a 25kHz input signal (right). 
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The evolved circuit displayed a 15dB gain at lOkHz and a 2dB attenuation at 
25kHz. 

Due to space limitation it is not possible to show in this paper the evolved circuits’ 
schematics. The reader can refer to [12] to get the netlists of the evolved circuits. 



3.2 Unknown Combination of Signal Sources 

The linear combination of two source signals (lOkHz and 20kHz) was applied to the 
FPTA-2 chip, as previously shown in Figure 3. Nevertheless, the source signals are 
unknown to the evolutionary system, in the sense that no information about the input 
signal is used by the fitness evaluation function. The fitness criteria employed here is 
to amplify the strongest tone at the input signal and attenuate the weakest one, which 
we respectively assume to be signal and noise. Four cells of the chip were used in the 
experiment. Figure 6 summarizes the performance of the evolved circuit. Initially, the 
lOkHz tone has about twice the amplitude of the 20kHz tone. The evolved circuit 
(Cl) output increases the “signal/noise” ratio from 4.1dB to 16.9dB. In this case, the 
lOkHz tone is attenuated by 2.86dB, while the 20kHz tone is attenuated by 15.8dB. If 
we invert the source signal ratios, it is observed that a new circuit configuration, C2, 
is evolved: the “signal/noise” ratio increases from 10.8dB to 18.5dB. In this case, the 
lOkHz tone is attenuated by 12.5dB, while the 20kHz tone is attenuated by 4.8dB. It 
can be observed that the performance is better in the first case. Table 1 compares the 
results in terms of the achieved and target FFT values O, and T,. 




Frequency(Hz) 




0 15,000 30,000 45,000 



Frequency(Hz) 



Fig. 6. In the left, FFT of the input (top) and output (bottom) of circuit Cl when the lOkHz tone 
has twice the amplitude of the 20kHz tone. In the right, the same information is shown for cir- 
cuit C2, when the 20kHz tone has a higher amplitude. 
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Table 1. Comparison between the target and the attained FFT for the two circuits presented in 
this section: circuit 1 (passing lOkFIz ) and circuit2 (passing 20kHz). 





Wanted tone 


Unwanted tone I 


Output(Of) 


Target (Tj) 


Output (Of) 


Target (Tj) 


Circuit 1 


16.1 


>20 


0.33 


0 


Circuit 2 


14.5 


>20 


0.21 


0 



3.3 Signal Separation Experiments 

The goal of this experiment is to design a circuit able to separate two mixed signals, 
E,(t) and Ej(t), obtained by the linear combinations of pure sine waves, e,(t) and ej(t), 
of known frequencies. The circuit outputs are the original pure sine waves. As in the 
previous case studies, we chose for the frequency of the pure sine wave e,(t), fj = 
lOkHz and for the pure sine wave C 2 (t), f^ = 20kHz. These signals were linearly com- 
bined by a mixing matrix to produce the chip inputs Ej(t) and E/t): 







'0.25 


0.5' 


■sin(22rl 0,0000’ 






0.5 


0.25_ 


sin(22r20,0000 



The set of experiments was performed using 10 cells of the EPTA-2. Eollowing the 
same procedure of the previous experiments, the cells were constrained to be invert- 
ing amplifiers reducing thereby the search space. 

Eigure 7 depicts the inputs and outputs of the best circuit achieved in this set of 
executions. 




Fig. 7. Result of the signal separation experiment. At the top the inputs El and E2 are shown. 
At the bottom the outputs 01 (lOkHz)) and 02 (20kHz) are shown. 

Table 2 summarizes the evolved circuit performance in terms of the FET and of the 
values measured in the frequency analyzer. The target FFT values used in the fitness 
function are also included in the table. 
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Table 2. Evolved circuits in the signal separation experiment. Amplitude of lOkHz and 20kHz 
tones as measured by the spectrum analyzer and calculated hy an EET algorithm used during 
evolution. 





lOkHz Tone 


20kHz Tone | 




EFT 


Spectrum 

Analyzer 


FFT 


Spectrum 

Analyzer 


Input E, 


33.6 


-13dB 


8.86 


-20dB 


Input E, 


7.6 


-20dB 


41.1 


-15dB 


Output O, 


36.9 

(Target:T,>20) 


-13dB 


1.07 

(Target:T= 0) 


-35.3dB 


Output Oj 


0.6 

(Target:T= 0) 


-30dB 


84.5 

(Target:T,>20) 


-13dB 



From Table 2, it can be observed that the output O, attenuates the component e^ 
(20kHz) hy -15dB (from -20dB to -35.3dB), while keeping e^ (lOkHz) at the same 
level. On the other hand, the output attenuates the component e, hy -lOdB (from - 
20dB to -30dB), and amplifies e^ by 2dB (from -15dB to -13dB). Finally Figure 8 
plots the fitness of the best individual along the generations averaged over 3 GA runs. 
It can he observed from this graph that the fitness does not go to 0 for the best indi- 
viduals. This is an expected systematic constant due to the way the fitness function 
was written - the minimum possible fitness in this experiment is 80. 




Fig. 8. Fitness against generations for the signal separation experiment. 

This experiment is a first approach to tackle the independent component analyzer 
problem, which consists of recovering the original source signals from signals ac- 
quired from a set of sensors that pick up different linear combinations of source sig- 
nals. 



4 Discussion and Conclusions 

Most of the experimental results could be conventionally implemented using low or 
high pass filters with zeros/poles in the range of 10-20kHz and a control or adaptation 
block. Supposing R=10K, a value that is in the range of the polysilicon on-chip resis- 
tances [4], we would then be talking about O.ljJ, capacitors. In order to implement a 
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capacitor of this size using the chip fabrication process (about IfF/jam for metallic 
capacitors), a silicon area of lOOmm^ would be used, against only about 2mm^ used by 
the evolved circuits. Several factors explain this discrepancy: the resistive effect of 
the switches contributes to increase the RC constant; to a small extent the exploration 
of parasitic capacitors of the MOS transistor; and the capacitances of the probing in- 
struments (lOp for the oscilloscopes). By cascading the filters evolved using a small 
number of cells, the FPTA-2 chip will be able to realize steeper frequency responses 
(possibly with roll-off superior to 60dB/dec). 

It has also been demonstrated that the reconfigurable hardware can evolve to 
“adapt” to changes in the input signal. The task of separating a mixed sine wave input 
(with frequencies unknown a priori) has been approached using a 2x2 neuromorphic 
analog network implemented in VLSI using a total of 76 transistors [1]. The network 
training time was 80ms and they were able to separate the two sine waves by 15dB 
(worst) to 35dB (best). The evolved FPTA-2 hardware used 140 fixed transistors, 
taking a training time of the order of seconds and separating the sine waves by 17dB 
(worst) to 22dB (best). In the case of Evolvable Hardware, we point out that the re- 
configurable chip was not specifically designed for the signal separation task, as it 
happens in the neuromorphic approach. 

The results demonstrate that our evolutionary platform can perform signal separa- 
tion and extraction for input signals consisting of a combination of sine waves, being 
also capable to adapt to different input signals without human intervention. Future 
applications will consist of the separation of more complex signals, such as mixed 
speech signals. 

Among the lessons learned from the experiments reported in this paper, we empha- 
size the advantages of using partly opened/closed switches; and constraining the re- 
configurable circuitry [4] to a particular topology. Future work will focus in extending 
these experiments to evolve filters with more stringent requirements; tackle more real- 
istic case of adaptive noise cancellation; and tackle the problem of separating mixed 
speech signals. 
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Abstract. This paper explores self-assembling artefacts using distri- 
bnted control. General design methods for distributed self-assembling 
artefacts are described and a simnlator is developed to simulate an ex- 
isting robotic nnit (M-TRAN) in a self-assembling context. Two mnlti- 
agent behaviors are implemented in the simulation to test the perfor- 
mance of the distribnted system. The behaviors are compared favorable 
against existing motion planning methods (e.g. clnster flow) for these 
self-assembling modular robots. 



1 Introduction 

Self-assembling artefacts are small robotic units that are able to cooperate, per- 
haps join together to form a more complex structure and change form to reach 
a complex task none of the individual units are able to reach without help from 
others. Such systems may be able to self-assemble and self-repair. Here, we im- 
plement a self-assembling system based on the physical realization of M-TRAN 
modules [IIVI Y' I Dli) . However, contrary to the work by Yoshida et al. [lYMK"*"!)!^ . 
our system will be totally distributed and the units will be able to cooperate to 
reach common goals. The system will be able to change it’s form. 

1.1 M-TRAN 

Many researchers have tried to develop self-reconfigurable robots (CEBOT, 
Robotic Molecules, FRACTA, CONRO, etc.Q but here we will concentrate on 
the Modular Transformer (M-TRAN) module that was developed at AIST in 
collaboration with Prof. S. Murata. The M-TRAN module consists of two semi- 
cylindrical parts connected by a link. Two RC servo motors embedded in the 
link enable each part to rotate 180°. The rotational axis of the two parts are 
parallel giving the module 2 DOF. The module has 6 connecting surfaces, 3 on 
each part. The inactive part uses permanent magnets, the active part has the 
magnets fixed at a distance from the face using non-linear springs. The active 

^ Links to the research done on these modules are supplied at the Hydra web-page 
(http:/ /hydra. mip.sdu.dk) 



A.M. Tyrrell, P.C. Haddow, and J. Torresen (Eds.): ICES 2003, LNCS 2606, pp. 296-^03 2003. 
© Springer- Verlag Berlin Heidelberg 2003 



Distributed Control in Self-reconfigurable Robots 



297 




Fig. 1. The M-TRAN module. 
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Fig. 2. Some examples of the work-range of the M-TRAN module (top view). 



part can disconnect any part connected to it by heating a shape memory al- 
loy spring, canceling the magnetic attraction between the permanent magnets 
fl^MK+Oll . 

The M-TRAN modules are controlled from a central computer sending spe- 
cific commands to each module. Although the modules do have on-board process- 
ing power, they are entirely controlled from the external computer. The second 
prototype of the M-TRAN module will have the inter-module-communication- 
capabilities needed for Multi- Agent behavior. 

The M-TRAN modules have a very nice property when used in a lattice- 
type of structure. The way they connect ensures the lattice structure to follow 
a checkerboard pattern, and they are well suited for thread-like structures, al- 
though the mobility of the module is very limited. 

1.2 The Work-Range 

The module has two parts that are connected by a link. In the link, two actuators 
are embedded allowing each part to be moved in a 180° range with respect to 
the link (see fig. EJ. 

The M-TRAN module has two different operating modes with respect to 
the base of the module (the base is simply the part of the module that is kept 
fixed during a movement). 1) Roll: In this mode the actuators move the module 
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in the vertical plane, the base is one of the end-faces. 2) Pivot: In this mode 
the actuators move the module in the horizontal plane, the base is one of the 
side-faces. With these two modes, the modules are able to move around on the 
ground (in one plane). 

To summarize the properties of the M-TRAN Module: 

~ Actuators: 2, ±90°. Parallel rotational axis. 

— Connectors: 6, 3 x permanent magnets, 3 x permanent magnets mounted 

on a plate with springs and one SMA spring. 

— Communication: 6 bidirectional serial ports. 

The M-TRAN modules use wired communication in both its prototypes. 
The first prototype only has one communication channel allowing the master 
computer to broadcast the commands to the modules using hard-coded IDs. 
The second prototype will have bidirectional serial communication allowing the 
modules to communicate with each other. 

2 Simulation 

For our simulations, we used Vortex, a commercial real-time physics engine for 
creating rigid bodies simulation and games, developed by CMLabs. It handles 
collision detection between the models as well as dynamic control (gravity, force- 
feedback, collisions etc.). 

The physical model of the M-TRAN module is defined in the XML-language 
supported by Vortex, this model is then imported into the C/C±± program by 
calling the generic methods in the API to Vortex. 

Modeling the Actuators. The actuators in the M-TRAN module are embed- 
ded in the link between the active and inactive parts. They are able to rotate 
each part 180°. In Vortex the actuators are implemented as /imge-joints with 
limits at ±90°, and positional encoders, enabling the programmer to ask the 
actuators at which angle they are positioned. 

Modeling the Attachment Mechanism. The docking mechanism used in the 
M-TRAN modules is fairly simple. The central component in the attachment is 
the magnets. The magnets have enough attracting power to keep the connection 
between two modules while lifting one of the modules by the actuators on the 
base module. When the connection is to be broken, a small Shape-Memory- Alloy 
(SMA) spring is heated, the SMA’s force combined with two non-linear springs 
is enough to cancel the magnetic force attracting the permanent magnets to each 
other. 

First of all, for the attachment mechanism, a callback-function uses the col- 
lision library to detect collision between the active part and the inactive part. 
When this occurs the distance and angle between the two parts are compared, 
and if everything is ok, the two modules are locked together. However, this 
method must be extended, because before the collision library will try to estab- 
lish a connection, the two faces must collide. A magnetic connection mechanism 
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Fig. 3. Left: A 16-module structure of M-TRAN Modules. Right: Signatures and 
roles of the modules. 



is able to attract the surfaces to each other even if they are not touching each 
other. Therefore, we introduce a periodical attempt at connection (the parts 
doesn’t have to touch to be connected) and the possibility of using tempera- 
ture in the simulation, to make all bodies vibrate in random directions using 
gaussian-noise. Finally, we must model the magnetic force around the magnetic 
faces. The force follows eq. □ 

I/I = maxforce • ^ (1) 

where / is the force acting on both surfaces, D is the range of the magnetic 
force and r is the distance between the magnets. This formula ensures that the 
magnetic force varies from 100% to 10% of the maxforce within the range of the 
magnets. Only one force is modelled for each connecting face (3 active faces per 
module). 

Designing the Behaviors. Behaviors that are to be implemented in the Sim- 
ulator are designed using Petri Nets |Pet62j and Class-diagrams from Unified 
Modelling Language (UML). 

The structure used in this work will be constructed by using composite mod- 
ules consisting of 4 M-TRAN modules connected 2 and 2 to form a firm structure 
with the ability to connect in all 6 directions (see left side of fig. 0- The compos- 
ite unit contains two parallel “lower” modules that connects two parallel “upper” 
modules (perpendicular to the lower modules) to each other. 



3 Walking Behavior 

The walking behavior is the behavior that enables the cluster of modules to move 
in one direction. The strategy is to move modules from the rear to the front of the 
structure. The goal of the behavior is to start a migration of modules from the 
rear of the structure to the front. The moving modules should ask each module 
it visits whether it has reached its destination or not. 
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Fig. 4. Trace of the movement path for the walkers. Left: Lower Walker. Right: Upper 
Walker. 



Related Work. One way of making the walking behavior is the “Cluster Flow” 
described in [ lyMK+oT^ . This behavior makes the modules move to the front one 
at a time. The upper modules use the other upper- modules to help them move 
along the side of the structure and the lower modules simply walks along the 
lower side of the structure. We also implemented this in our Vortex simulation, 
and a video of this behavior (Walking. Alpha. mpg) can be seen on the Hydra 
web-page (http://hydra.mip.sdu.dk). However, cluster flow is extremely ineffi- 
cient. A substantial amount of time is used to get the upper-modules to help the 
upper-walker because of the way the structure is organized. 



4-unit Walker. Another possibility, that we explore here, is to let the upper 
and lower modules work together to reach the front of the structure. The idea 
here is that since the lower-module simply walks along the side of the structure 
- and the functional unit is a 4-module unit - the lower module should connect 
with the upper- module and together they should form a “micro” -walker and 
then walk to the front together. 



3.1 Petri Net 

The Walking Behavior is designed using Petri nets. The modules are assigned 
roles as the behavior is executed. The roles are illustrated in fig. Q A module 
will change its role during the execution of a behavior. A LW will end up as a 
LH when the walking is complete. An Upper Helper UH will become an UW 
and finally an US (if it ends up in the “spine” row). 

The path the LW and UW follows are shown in fig. 0 

As an example, the Petrin net for starting a walking movement is shown in 
fig. □ 

4 Morphing Behavior 

The morphing behavior uses the hormone propagation to determine the center 
of the structure. This center is then used as the “stopping-point” for the walking 
behavior. 
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Get Ready for Walking 




From "Start Walking” 

Init (US) . " " 

Move out of the way (LW) 



Turn towards LW (LH) 





Go back in the grid (LH) 



Go to the' side (UW) 



- 

I From "Stop Behavior" (US) 




To "Walking" (LW) 



Fig. 5. An example petri net: The Get Ready for Walking behavior. The roles and 
signatures of the modules are shown in fig. 0 right. 



4.1 Hormone Propagation 

A program is propagated to the end of the spine. When a module determines that 
it is an end-module (no connections on the female end) a hormone is sent through 
the spine from the active to the inactive part on the next module, each module 
stores the value of the hormone in the memory, increments the value and sends 
the hormone to the next module (on the inactive connector). This procedure 
is executed until the other end of the spine is reached. Here another hormone 
is send back doing exactly the same, but storing the value of the hormone in 
another memory location (see fig. OJ • 

The end is determined to be the right-most spine-module. This module ini- 
tializes the A hormone (see fig. E) to 0, increments the value, and sends it to the 
module to the left. This procedure is repeated until the left-most module receives 
the hormone. This module has no other modules to send the incremented value 
to, it is the end-module. The end-module then sends a different hormone, B, back 
through the structure. When the front-spine receives the B hormone it knows 
that all modules (in the spine) have valid values for the A and B hormone, con- 
stituting two gradients through the spine. The mid modules can be determined 
by \A-B\< 1. 
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Hormones 



B 

A 



0 I 1 I 2 ^ 3 

3 ' 2 ' 1 ' 0 




Spine 



Fig. 6. Lfsing hormones to determine the mid-point of the structure. 



When this cycle is ended, all modules have two gradients indicating how 
far they are positioned from the structure’s head and tale. If the spine-module 
determines it is in the center of the structure, it signals the lower modules that 
they are the “end-of- walking” -modules. 

After the hormone propagation is ended, the structure starts the walking 
behavior again, here the Stop-Walking will indicate when the module has reached 
the new position. The “Return to Lattice” behavior is altered to reorganize at 
the stopping point. 

5 Deadlocks and Collision Detection 

The use of petri nets makes it possible to determine where dead-lock can occur. 
If a module fails while having a token that is supposed to be returned to another 
module (if there exists a transition in the net with arcs to other modules) the 
other module will never gain control of the behavior and the execution of the 
behavior will block. The dynamic design issue indicates that no module is allowed 
to go into a waiting loop for a substantial amount of timeEl This constraint is 
applied to the system to be able to detect failing modules. If a module awaits a 
token from another module, a time-out mechanism can be build-in to determine 
when a module fails to accomplish a parallel task. 

Researchers often run into the problem in a simulation with distributed con- 
trol, the collision detection may be very difficult (or viewed as impossible). The 
simulator described in [ tyMK+fli^ uses a global scheme planner to traverse all the 
graphs representing possible movements of the individual modules. A solution 
is discarded if it makes two modules collide. This will ensure, that the program 
transferred from the central computer to the modules only do “valid” actions. 

When using distributed control, collision detection becomes much harder, 
since no single entity knows the configuration of the whole structure. In the 
simulator used in this work, a module only knows if it is connected to a module 
or not. This makes collision detection extremely difficult, since the modules don’t 
know anything about modules that are not their directly connected neighbours. 

5.1 Locking as Collision Avoidance 

To make sure a module doesn’t execute an action that will make it collide with 
another module, the programming of the modules has to take the structure into 

^ substantial amount of time is context specific, no general value can be used 
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, Controlling Module 




Fig. 7. The critical section, the inquiring module has to get a lock on the controlling 
module to avoid collisions with other modules wanting to occupy the critical section. 




Fig. 8. Dead-lock condition with two moving modules, each modules blocks until they 
obtain the lock on the base to their respective critical sections, which will never happen. 



account. Since a “critical” section is an empty space the critical section can not 
be locked directly. That is why the organization of the structure must be taken 
into account. Instead of locking the critical section a module that wants to use 
the critical section has to get a lock on the module that “controls” the critical 
section (see fig. Q. 

Semaphores. In our case, a critical section in the simulator is not a piece of 
code, but a space in the structure, so the critical section can not be locked as 
easily as in the case of concurrent programming. Instead, we define each module 
as both a processing unit and a resource, this way a module can access another 
module exclusively, e.g. when using it as a base for a movement. 

However, semaphores introduce the possibility of dead-locks. In fig. 0a possi- 
ble dead-lock situation is depicted where two module move in opposite directions 
in the structure, each module has a lock on its “base” and want to obtain a lock 
on the next module in line (unfortunately this module constitutes the base of 
the other moving-module). 

5.2 The “Walking” Behavior and Collision 

The walking behavior described earlier uses the locking to ensure an area is free 
before moving into it. This method uses extensive knowledge of the structure. It 
is divided into the following steps. 

Base Lock. Whenever a module uses another module as a base, the active 
module will lock the base-module. 
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Self Lock. Whenever a module wants to move into an area with the intend of 
connecting to a base-module, the module is queried (through the structure) 
and asked to obtain a self-lock, to ensure no other modules uses it as a base 
or intend to move to the module as a base. 

This schema works as long as the programming of the modules are carefully 
planned. The locking mechanism locks a whole module, not only the active or 
inactive part. Though this method can be used to avoid collision between mod- 
ules, it will introduce unnecessary stalling of the system, if a module is locked 
without being critical (e.g. to be used as a mediator of messages). 

6 Results 

The walking behavior will move at least 4 units from the rear of the structure 
to the front. The 4 units will move in pairs forming micro-walkers. 

The walking behavior is made of three parts 

1. Get Ready for Walking - This is the first stage of the walking behavior, here 
the micro-walkers are connected on the side of the structure. The assembling 
of the micro-walker is different for the two micro-walkers constituting the 
4-unit composite. 

2. Walking - This is the universal part of the behavior, it applies to all the 
micro-walkers. This sub-behavior will obtain a lock on the next lower-module 
(to make sure no other walking modules are in the position the micro-walker 
desires to go to), then the step is completed and the lower-helper module is 
asked whether it is an end-module, if no, the walking continues. 

3. Return to Lattice - This is the last stage of the walking, here the micro- 
walkers are returned to the lattice. This sub-behavior has two implemen- 
tations; one for the Walking Behavior and one for the Morphing Behavior. 
Each micro-walker is treated differently (depending on the space left for the 
walker) . 

The behavior is monitored by tracing the single module’s two parts. A trace of 
the first micro-walker step in a 16-module configuration is shown in fig. 01 

The morphing behavior transforms one row into two rows. It utilizes the 
Walking Behavior to move the modules to the correct position. In the design 
phase the design of the morphing behavior was the following: 

1. Detect Mid-point in structure - Use hormone-propagation to create two gra- 
dients in the upper spine modules, and use these gradients to detect the 
mid-point of the structure, then make sure the lower-helper modules that 
are in the middle of the structure knows they are end-point modules (when 
queried from the lower-walker) . 

2. Initiate walking from both sides when the walking is initiated at both ends of 
the structure, the morphing behavior is fast compared to the walking (two 
walking behaviors are executed in parallel). 
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Fig. 9. Stills of the Morphing Behavior with 16 modules. 



This was the initial design of the Morphing Behavior. When the behavior finally 
was implemented the detection of the mid-point was done as described, and 
posed no problems. But the walking from both sides was discarded because it 
would require a “walking from front to rear” behavior that would be trivial to 
implement, but very tiresome since it would be a mirror of the “walking from 
rear to front” behavior. Had it been implemented, the morphing behavior would 
be prettier to look at, but the resulting state of the structure would still be the 
same. A morphing behavior can be seen in fig. Q 

6.1 Communication 

One of the topics of concern when discussing distributed computing, is the com- 
munication load of the system. When executing a behavior, 3 different types of 
communication is used: 1) Requests - When a module wants another module to 
perform some action or behavior, the module will send a request to the help- 
ing module. 2) Synchronization - When two modules interact, and their actions 
need to be synchronized, they communicate using simple synchronization com- 
mands. 3) Propagation - When a module has no knowledge of the program it 
was requested to execute, the module will ask the initiator to send that program, 
propagating the knowledge to the module that needs it. 

Requests and Synchronization commands are fairly simple and the data trans- 
ferred is small compared to the Propagation of knowledge. 

Walking. This section describes the results obtained from the simulation using 
a fully distributed setup. That is, all modules are preprogrammed to contain 
the programs they need to complete the “Walking Behavior” . In the experiment 
the lower front module is asked to start the walking. The Walking Behavior 
is continued until 12 modules has been moved from the rear to the front. The 
experiment is run for 12, 16, 20, . . . , 36 modules. 

As the graphs in fig. [Cl] shows, the communication doesn’t explode as the 
number of modules increases. The correlation between the communication load 
and the number of modules is almost linear {Rtime = 0.990 and Rlomm = 0.998). 
The average communication per module is approximately constant. 
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(a) Time (b) Communication 

Fig. 10. The time and communication used to move 12 modules with no propagation 
of programs. 

Table 1. Data from morphing experiment with 16, 24 and 32 modules. 



No. Modules 


Time used 


Communication 


Comm, per module 


16 


146.425 


1901 


119 


24 


214.600 


3778 


157 


32 


282.775 


6196 


194 



Table 2. The linear fits to the data collected in table 0 



type 


slope 


offset 


R‘‘ 


Time used 


4.68 


44.4 


0.9997 


Communication 


268.44 


2484.2 


0.9947 


Comm, per module 


8.52 


10.1 


1.0000 



Morphing. The morphing behavior is executed from the lower front module 
with 16, 24 and 32 modules. The measured data is the time for the morphing 
behavior to be completed, that is for the last module in the second row to connect 
fully with the structure and the amount of data transferred to accomplish the 
goal, these values are listed in table ^with the addition of a column containing 
the average communication per module (to see how well the behavior scales). 

The three value are still approximately linear (all thought the test-set still 
is too small to say for sure). The linear fits from the three series are shown in 
table 0 

The slope of the communication per module is positive, meaning that the 
communication per module increases as the number of modules increase. Since 
the modules can only by increased by multiples of 8 (2 composite units ~ 4 
micro-walkers ^ 8 modules) the number of modules that need to be moved 
increases as well as the distance they need to travel. 

In a 16-module structure 8 modules need to be relocated from the rear to the 
side of the structure, meaning 4 micro-walkers need to travel across 7,5,3 and 
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1 module (the distance is shortened in both ends). In a 24-module structure 12 
modules need to be relocated, i.e. 6 micro-walkers will travel across 11,9,7,5,3 
and 1 module. 

7 Conclusion 

The “Walking Behavior” can be compared to the “Cluster Flow” described in 
IIYMK+Oll . The “Cluster Flow” sends one module to the front before sending the 
next module. The modules move in the layer they belong to (the upper modules 
use the upper modules to move, the lower modules only use the lower modules 
to move). 

Compared to this “Cluster Flow”, the “Walking Behavior” in this work is 
much more efficient. 

1. The modules are sent on their way continuously from the rear using local 
communication . 

2. The modules from the two layers cooperate to make a “micro” -walker that 
are able to walk on the side of the structure. This enables the modules to 
use the lower-modules as connecting surface for the entire walk. 

On the other hand, the behavior implemented in the simulation has a delay 
from the stop-condition is set to the behavior actually stops (because the last 
two micro-walkers have to get to the rear, and there may be several “walkers” 
on the side of the structure). 

Regarding communication load, the presented results for the Walking Behav- 
ior and the Morphing Behavior show that the system scales very well in number 
of units. The communication load increases linearly with the number of units, 
and the average communication per module goes toward a constant value as the 
number of modules increase. This is due to the localized communication. 
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Abstract. Reports on evolutionary robotics systems have so far been 
on evolving controllers that make simple robots do simple tasks in simple 
environments. In this paper we try to stress the evolutionary robotics ap- 
proach by evolving a controller for a more complex task, namely Khepera 
robot soccer, and evaluate evolved controller performance against hand- 
coded controllers. We present a system that uses competitive co-evolution 
to develop robot controllers for the task. The system is described, and 
performance of the system is documented. Co-evolution is tested against 
single-population evolution, and it is concluded that co-evolution has 
the ability to produce more robust individuals with respect to opponent 
strategies. 



1 Introduction 

Many of the evolutionary robotics systems that have been reported so far, has 
documented successful generation of robot controllers for simple robots perform- 
ing simple tasks in simple environments. In this paper we will try to go one step 
further and apply the method to a more complex and competitive task, namely 
the Khepera Robot Soccer task. Using this task as a testbed has two advantages: 
1) No known optimal strategy exists. 2) Real-world competitions exist, so that 
evolved controllers can be tested against hand-coded controllers in a neutral 
competitive scenario. Given the competitive nature of the task, the task lends 
itself to artificial co-evolution. The emphasis of this paper is on co-evolution 
applied to this task, and on the viability of the evolutionary robotics approach 
in general. 



1.1 The Khepera Robot Soccer Task 

In a Khepera robot soccer match two Khepera robots are pitted against each 
other in an arena. The arena is 105cm long and 68cm wide with cut off corners, as 
shown in figure E The Khepera robot is a widely used miniature battery-driven 
robot with on-board processing capabilities from K-Team. The Khepera used for 
soccer playing is equipped with 8 IR-sensors, wheel encoders and a camera. 

A match consists of five rounds. A round lasts until a goal is scored, until 
the ball has not moved for thirty seconds, or until four minutes have elapsed. At 



A.M. Tyrrell, P.C. Haddow, and J. Torresen (Eds.): ICES 2003, LNCS 2606, pp. 308-|31^ 2003. 
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Fig. 1. A picture of the football arena. 




Fig. 2. The implemented system is an evolutionary robotics system using a simulator 
to speed up evolution. 



the beginning of each round, the players are positioned on random symmetric 
starting points. The ball starts at the center of the pitch. Each player starts 
facing the opponent’s goal line at random, symmetric starting positions. 

2 Approach 

We decided to use an evolutionary robotics approach, and to construct a sim- 
ulator to speed up evolution. The implemented system is shown in figure 0 
After a pre-specified number of generations in the simulator, the best perform- 
ing controller is translated into a binary file that can be executed directly on a 
Khepera robot. The following sections will go into more detail about each of the 
components of the system. 

3 Co-evolution 

Since D. Hills, |Hil92J . showed that better sorting networks could be produced 
when using co-evolution, co-evolutionary methods have spread to other fields 
of research where Evolutionary Algorithms are being used, and 

[ [BLP96 II . The task of evolving a robot football player intuitively lends itself to co- 
evolutionary methods because of the competitive nature of the problem. Given 
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Fig. 3. Illustration of the idea behind co-evolution. 



a specific opponent, it might be possible to find the best strategy, but there is 
no single strategy that is best against all opponents. The task is thus to find a 
player that performs well against a large number of opponent strategies. 

An attempt to illustrate competitive co-evolution for two populations can be 
seen in figure 0 When one population (A) finds a good locally optimal strategy, 
the competing population (B) is encouraged to find a solution that performs 
especially well against the other populations strategy. This changes the fitness 
landscape for population (A), forcing evolution to find another solution to the 
problem. 

The effect of one population changing the fitness landscape of the other 
population is called the Red Queen Effeet, described in [ U1VL95I . The Red Queen 
Effect has been examined by co-evolving pursuit and evasion behavior, both in 
simulation and on real robots [IK IN IVIOI^ . In [IkNMfixJ and psng, D. 

Floreano et al. investigate the dynamics of competitive co-evolutionary systems, 
and show that the Red Queen Effect can give rise to “cycles” between alternative 
classes of strategies and can make it difficult to monitor evolutionary progress. 
They also state that co-evolution can be used with benefit when the task is such, 
that the best strategy of one population is dependent on the strategy of the other 
population. 

3.1 The Evolutionary Algorithm 

The co-evolutionary algorithm was implemented as two separate GA’s. There 
is no exchange of gene material between the two populations. After much trial 
and error, good evolution parameters were found to be a population size of 50, 
an elite of 10, and these 10 were also used for reproduction. The probability of 
mutation was 5% per gene (byte), and there was no cross-over. 

For evaluation, individuals played 25 simulated rounds against individuals 
from the opponent population (every second player, alternating). The following 
fitness fuction was used; 

— 10.000 per goal, -10.000 for own goalO 

— 2.5 per second remaining of the round when scored a goal. 

— -0.01 per cm to the ball when the round ends. 

The player getting (mis)credit for the goal, is the player that most recently touched 
the ball. 



1 
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The fitness for one round was the sum of the three components. The first 
component dominates the fitness. The second and third components are only 
relevant in cases where an equal number of goals have been scored. The second 
component favors quick goals, and the third component favors robots that stay 
close to the ball. The third component is meant for bootstrapping the evolution- 
ary process. 

Design of the fitness function was a guess based on intuition. It turned out 
to serve it’s purpose, and no further experimentations were performed. 

4 The Simulator 

A central issue in any evolutionary robotics systems is the evaluation of can- 
didate controllers during evolution. It was decided to perform the evaluations 
in a simulation modelling the Khperea soccer world for a number of practical 
reasons. Using a table based model ILVUNLilbl was rejected due to the large state- 
space of the world. Instead a hybrid between the minimal simulation approach, 
suggested by Nick Jacobi IllacfiiSal . and a geometric model is used. Basically ev- 
erything that easily could be modeled geometrical is modeled geometrically, and 
then the rest is made unreliable by adding noiseQ 

The simulator was constructed using iterative refinements, where each ver- 
sion of the simulator was tested by observing differences in the simulated robot 
behavior and the real world robot behavior. 

4.1 The Use of Minimal Simulation 

The interaction between the different world objects in the soccer arena is very 
difficult to model accurately IbmiDbl . A minimal simulation approach could be 
to make the contact between world objects unreliable for the robot controllers, 
as described in ILIac98bl . By making the simulated environment more unreliable 
than the real environment, evolved controllers would be forced to be robust, 
thus increasing the chance that the controller will perform well when transferred 
to a real robot. In the simulation, interaction between world objects is split up 
into cases, and specialized code deals with each specific case. For each cases, 
interaction was modeled according to observation as closely as possible and then 
noise was applied both to robot and ball movement. 

As stated in pCH92j and [IMC96J . producing simulations of visual sensing is 
a very time-consuming task, both for the programmer when building the simula- 
tor, and for the computer during simulation. Instead of simulating the Khepera 
camera pixel by pixel, the output from the pre-processors were simulated. 

5 Behavior Representation 

The robot controller is a fixed-structure tree of arbitrators, as shown in figure E 
Modules 0-6 are arbitrator modules, and modules 7-42 are primitive actions. At 



^ An applet can be seen at http://www.mip.sdu.dk/~esben/EvoRobSoc/applet/ 
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Fig. 4. Architecture of the controller. A tree of behavior modules. 




Fig. 5. The two types of arbitrators. Control is propagated down to a sub module. In 
the sequential arbitrator, control is transferred to the sub module corresponding to the 
state of a finite state automaton. In the “reactive” arbitrator, control is transferred to 
the sub module corresponding to condition with the highest activation. The activation 
goes high when the condition is true, and then decays exponentially when the condition 
is false. The height and slope is determined by genes. 



each 100 ms time step, control is propagated down through the tree to one of 
the 36 primitives, which then has full control of the robot for the duration of 
the time step. The architecture is based on the task decomposition approach 
described by W. Lee, J. Hallam and H. H. Lund [ITiHLQTj . 

Two types of arbitrators were used, implementing sequential and reactive 
arbitration, both illustrated in figure Q Both arbitrators uses a set of conditions 
to arbitrate between six sub modules, and transfer control to only one of them. 
The conditions are implemented as fixed size boolean expressions. Figure l^hows 
the structure of the condition trees (a), and an example (b). The bottom four 
nodes of the condition trees can be constant values or current sensor reading of 
the robot. 

The contents of the nodes in the behavior and condition trees was coded by 
the gene strings. A genotype is a string of 638 bytes, corresponding to a search 
space of about 36 bytes code for the primitve actions, whereas the rest 

code for the conditions. 

5.1 The Behavior Primitives 

Generalizations of the behavior primitives used in a successful hand-coded robot 
football player are used as building blocks for the evolved controllers. The mo- 



Co-evolving Complex Robot Behavior 



313 



Bit 


^ 1 * 1 ^ 


4 3 


^ 1 ‘ 


0 




Gene 
(1 byte) 


XXX 


X 


X 


X X 


X 








Reverse turn direction 



Translation speed Rotation speed Turn direction 




Fig. 6. A gene coding for a primitive action. All speeds are in Khepera speed units. 
Rotation speed is the difference in speeds for the two motors. 



tor primitive are combinations of three parameters; translation speed, rotation 
speed, and rotation direction, as shown in figure 0 Note that for example action 
number 001 — 01 — 10 — 0 will make the robot move forward while turning toward 
the ball. 

The sensor primitive are output from preprocessors, not the raw Khepera 
sensor data. 

— The proximity sensors, filtered so the returned value is the minimum of the 
last three measurements, and then paired two and two, as in 

— The output of the image processing algorithm. The algorithm returns the 
centroid and width of the ball-blob, if the ball is in the image. 

— A “stuck sensor” that uses the position counters to determine whether the 
robot is stuck or not. 

— A sense of direction, that uses the incremental encoders to give an approxi- 
mated heading of the robot. 

These motor and sensor primitives are quite high level compared to what 
is described in other systems in the studied literature. M. Mataric and D. Cliff 
wrote: “This abstraction of representation allows for significantly reducing the 
search space and can greatly accelerate the evolutionary process” [ IIV1C9KJ . A 
counter argument is, that using higher level primitives reduces the search space so 
fewer potential solutions are available to evolution. Also, higher level primitives 
require more work for the programmer. 



6 Tests and Results 

The system was tested both on performance in the simulation, an on the per- 
formance in the real world. Also, a number of tests were performed to evaluate 
the use of co-evolution. 
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Fig. 7 . The structure of a condition, (a) the general structure, (b) an example, 
(a) (b) 





Fig. 8. Different evolved football strategies. 



6.1 Behavior in Simulator 

The first test was to test weather the given architecture and primitives were 
sufficient to make a simulated robots play soccer. Initial runs revealed that after 
about 250 generations the best individuals of both populations were able to score 
about 80% of the time in the simulator, when alone in the arena. This certainly 
shows that the set of primitives used is sufficiently expressive, but it also gives 
rise to some concern with respect to the realism of the simulator, since 80% 
scoring rate is very good, even for a hand coded robot. 

Evolved Strategy. Two distinct strategies for approaching the ball were ob- 
served in the evolved robots. The two types are shown in figure 0(a) and (b). 
The figure shows one of the difficult situations, where the robot has to go back 
and get behind the ball to avoid scoring an own goal. Neither of these strategies 
resemble the typical hand-coded strategies. 

6.2 Testing Real-World Performance 

A severe drop in fitness was observed when robots are transferred to reality. In 
the real world, the evolved robots would typically get a fitness of about 1000, 
which is quite far from the about 8000 typically obtained in simulation. This 
difference, known as the reality gap, is in this application probably mainly due 
to imperfect modelling of skidding and world object interactions. 

The system was really put to the test, when an evolved robot controller 
participated in the Danish Robot Football Competition, Dec. 1999 under the 
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Fig. 9. The plotted position of the robot Brute Force and the ball in the simulator. 
The ball starts to move when Brute Force pushes the ball. 

name Brute Force. Videos can be found on http://www.mip.sdu.dk/~esben/ 
EvoRobSoc, and is the main source of documentation for the real robot perfor- 
mance. In the preliminary matches, Brute Force won two times and had one 
draw, which was enough to qualify for the semi finals. In the semi final Brute 
Force won 1-0 without much difficulties, but lost in the final against KITT, 
partly due to a dead battery in the second round (The evening before the final. 
Brute Force won 3-0 against KITT in a test match). An other robot soccer player 
evolved with the system participated in the FIRA2002 WorldCup KheperaSot 
tournament, where it won the second place. 

The behavior of Brute Force is quite different from the behavior of the hand- 
coded competitors, being more in direct contact with the ball. Even though 
Brute Force was moving slower than its opponents, it still won many games by 
better maneuvering. Brute Forceps behavior in the simulator is shown in figure 0 
By observation, this behavior corresponds to the behavior in the real world. An 
applet showing more Brute Force behavior can be seen on the papers home 
page0. 

6.3 Testing Co-evolution, Chasing the Red Queen 

Two different approaches were tried to track the Red Queen. One approach 
is evaluating individuals against reference players. The other approach is the 
Master Tournament method, described in |EJN98| . Four tests were set up to test 
the effect of co-evolution. 



Is Co-evolution More Efficient than Single Population Evolution with 
Respect to Use of Computing Power? Two evolutionary setups were im- 
plemented. One with co-evolution of 2 x 50 individuals in the populations, and 
one single population evolution with 100 individuals in a population (players 



® http://www.mip.sdu.dk/~esben/EvoRobSoc/ 
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Generations Generations 



Fig. 10. Co-evolntion vs. single popnlation evolution, test 1. Performance against a 
reference player of the co-evolution with two populations of 50 robots (a), and a single 
population of 100 robots (b). 



are siblings). In both cases, performance was measured a-posteriori against a 
reference player. 

Five runs were started with each setup, results are shown in figure El The 
single population evolutionary algorithm seems to climb faster, and seems to 
reach higher values. To test whether there is a statistical significant difference 
between the performance when using evolution or co-evolution, the average fit- 
ness of the last 100 generations was calculated for each population. From these 
data, statistics show that co-evolution performs significantly worse than single 
population evolution. 

Does Co-evolution Hurt Evolution? Another test was designed to test 
whether co-evolution made the evolutionary algorithm worse. In this test, the 
population size for the single population evolutionary algorithm was decreased 
to 50 and compared to 2 x 50 individuals in the co-evolutionary algorithm, so 
that the single population evolutionary algorithm runs with half as many fitness 
evaluations as the co-evolutionary algorithm. 

Six runs, each 1000 generations, were made with the single population algo- 
rithm, and compared to five co-evolution runs. Results can be seen on the left 
half of figure Iffl 

The data shows that the difference between the two algorithms is not signif- 
icant. The co-evolutionary algorithm shows no advantage over the single popu- 
lation algorithm in our data. Since the co-evolutionary algorithm uses twice as 
many fitness evaluations, the single population algorithm is to be preferred. 

Does Co-evolution Win in the Long Run? Due to arms-race dynamics, co- 
evolution might have an advantage after some time, when the single-population 
evolution reaches a plateau. A test was made in which evolutions from the left 
graph of figure ^3^ere continued for another 2000 generations. The result of this 
run can be seen on the right graph in figure ^Qand shows no significant difference 
in the two methods (at 5% confidence level). Over the last 100 generations, the 
single population evolutionary algorithm has a higher average value than the 
co-evolutionary, but the difference is not significant. 
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Contimjad comparison between ordinary evolution and co-evolution 




Fig. 11. Left: Co-evolution vs. single population evolution. The graph shows the aver- 
age and the 95% confidence interval of the elite htness-evaluations averaged over both 
populations. The thin line is the single population evolutionary algorithm (6 runs), the 
thicker line is the co-evolution one (5 runs). Right: A continuation of the graph shown 
on the left. 



Does Co-evolution Produce More Robust Results? A fourth test was run 
to test whether the robots evolved with co-evolution were more robust with re- 
spect to handling different opponent strategies than the ones evolved with single 
population evolution, as argued by Stefano Nolfi and Dario Floreano in [INF98] . 
An experiment was set up, in which 10 co-evolved individuals played against 6 
single population evolved individuals. The co-evolved individuals were chosen to 
be the best individuals from each population from the previous experiment, ac- 
cording to the master tournament method. For the 6 single population runs, the 
chosen individuals were the best individuals from the final generation of each run. 
Each of the co-evolved individuals was set to play 100 rounds against each of the 
individuals evolved with single population evolution. Each co-evolved individual 
thus played 600 rounds, while the other individuals played 1000 rounds. The 
average fitnesses obtained from these rounds are shown in the table in figure oi 
The table shows a significantly higher average fitness for co-evolved individuals. 

7 Discussion and Conclusion 

A system has been implemented, that successfully uses evolution to generates 
controllers for the Khepera Robot Soccer task. Using this system, co-evolution is 
compared to single population evolution through four tests. The first three tests 
showed no benefit from using co-evolution. Single population evolution seems 
to converge faster to good solutions than co-evolutionary algorithms. However, 
the fourth test revealed, that the co-evolved individuals have developed more 
robust strategies than the single population evolved individuals. This supports 
D. Floreano and S. Nolfi’s hypothesis, that “co-evolution can have a higher 
adaptive power than evolution” fFN98| . This series of experiments also shows, 
that the effect of co-evolution can be very difficult to measure. Under what 
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Average fitness for individuals 


Player No. 


Without co-evolution 


With eo-evolution 






Pop. A 


Pop. B 


1 


2306.11 


3298.83 


1481.15 


2 


2277.18 


5349.24 


4278.84 


3 


2772.82 


3200.50 


3432.08 


4 


1110.28 


3011.15 


6223.48 


5 


3964.32 


4182.49 


4476.53 


6 


245.54 






Average 


2112.71 


3893.43 


Std.Dev. 


1297.68 


1323.49 



Fig. 12. Single population evolved players playing against co-evolved players. 



prerequisites the positive effect of co-evolution will take place is more or less 
unexplored territory. 

The objective of the work was to explore the co-evolutionary robotics ap- 
proach and to test whether it could be used to evolve behavior for the Khepera 
robot soccer task. This seems to be the case. However, existing theories still need 
further development in several areas to reduce the amount of intuition required 
to build such a system. The problems of constructing simulators of a sufficiently 
high fidelity seems to be the major obstacle for the viability of the approach 
taken in this paper. Other problems are the heuristics involved in deciding be- 
havior architecture and sensor and motor primitives. 

In 1992, R. Brooks wrote “To compete with hand coding techniques it will be 
necessary to automatically evolve programs that are one to two orders of mag- 
nitude more complex than those previously reported in any domain.” [IHro92J . 
This goal seems to have been reached for the present task, since the controllers 
evolved with this system plays football at least as well as most of the hand coded 
controllers in the Khepera robot soccer competitions. 
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Abstract. In Yamauchi and Beer explored the abilities of continu- 
ous time recurrent neural networks (CTRNNs) to display reinforcement- 
learning like abilities. The investigated tasks were generation and learn- 
ing of short bit sequences. This “learning” came about without mod- 
ifications of synaptic strengths, but simply from internal dynamics of 
the evolved networks. In this paper this approach will be extended to 
two embodied agent tasks, where simulated robots have acquire and 
retain “knowledge” while moving around different mazes. The evolved 
controllers are analyzed and the results are discussed. 



1 Introduction 

Yamauchi and Beer ^ showed in 1994 that continuous time recurrent neural 
networks (CTRNNs) can be evolved to display reinforcement learning-like abili- 
ties. The investigated tasks were generation and learning of short bit sequences. 
These tasks were accomplished without modifications of synaptic strengths, but 
simply by exploiting the rich internal dynamics of this class of networks. The 
claim was that the kind of “learning” showed by the networks is more biological 
plausible than more traditional approaches to learning such as reinforcement 
learning g]. In order to justify their choice of bit sequence tasks, the authors 
made the analogy that a rat, when navigating a maze, is faced with a sequence 
binary decisions of which way to turn. Despite of this analogy the tasks studied 
were very artificial seen from an evolutionary robotics perspective. 

This issue will be picked up upon in this paper by applying a similar approach 
in a more “realistic” setting of a simulated robot navigating a maze. In the 
first of several trials in an environment the robot has to a acquire and retain 
“knowledge” which can then later on be exploited. In order to show the feasibility 
of the approach the first task investigated is a simple setup where the movement 
of the robot is constrained to one line. In the second task the robot has to 
navigate in the more challenging environment of a T-maze. The learning-like 
properties of a successful controller is analyzed and it is shown how to modulate 
the robot behavior by changing the activity of one neuron of an evolved network. 
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Tuci et. al. |0 have recently, in parallel to the work presented in this paper, 
applied a similar approach in order to solve an extended version of a landmark 
task also originally studied by Yamauchi and Beer 0. A simulated robot had 
to sometimes approach a light source and sometime avoid it, based on the posi- 
tion a reward-zone in the environment. A very complex and task specific fitness 
function, however, had to be used in order to evolve successful controllers. The 
reason for this might be that task studied required evolved controllers to use the 
same sensory modality for the two very different behaviors of light approaching 
and light avoidance. One could suspect that this fact made the search problem 
for the genetic algorithm much harder. In the work presented in this paper the 
focus is on applying much simple fitness functions, but shaping the experimental 
settings in oder to make to search task for the genetic algorithm accomplishable. 

We have previously compared the class of CTRNNs used in this article to 
Plastic Neural Networks (PNNs) with online modification of network weights, 
on a set of tasks requiring sensory-motor adaptivity and learning [0. The results 
obtained were that the PNNs were more adaptive with respect to changes applied 
to the environment after the evolutionary process, but the CTRNNs, on the other 
hand were easier to evolve to display learning-like abilities. For this reason focus 
of this paper will be on CTRNNs. 

The rest of this paper is organized as follows. Section ^describes the neural 
network used, section 0 describes the first simple experiment, section ^describes 
and analyses the more complex T-Maze experiment. Section 0 discusses the 
results obtained and concludes the paper. 



2 Continuous-Time Recurrent Neural Networks 



In the continuous-time recurrent neural networks used for the experiments in 
this paper the state of each neuron can be described by the following differential 
equation: 



t 

dt Ti 
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where N is the number of neurons, i (= 1,2, ...,A^) is the index, 7 ^ describes 
the neuron state (cell potential), is the time constant, Wij is the strength of 
the synapse from the presynaptic neuron j to the postsynaptic neuron i, Aj = 
cr(jj — 0j) is the activation of the presynaptic neuron where (t{x) = 1/(1 -I- e““) 
is the standard logistic function and 9j is a bias term. Finally, S is the number 
of sensory receptors, Wik is the strength of the synapse from the presynaptic 
sensory receptor k to the postsynaptic neuron i and Ik is the activation of the 
sensory receptor (Ik € [0, 1]). As in 0 the Forward Euler numerical integration 
method was used. The iterative update rule for the state of each neuron becomes: 
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(a) (b) (c) 



Fig. 1. The Khepera robot simulated in the experiments. The robot has (a) 8 infrared 
sensors distributed around the body used for object proximity and ambient light inten- 
sity measurements; (b) a linear vision module with 64 equally-spaced photoreceptors 
covering a visual field of 36°; and (c) a floor sensor measuring surface brightness. 



where n is the iteration step number and At is the step size. Initially the state 
of each neuron was 7i(0) = 0 Vf, the step size set to At = 1. The range of the 
other parameters were the following: 

r G [1, 70], 0 G [—1, 1] and w G [—5, 5] 

Notice from equation (0 how the time constant r determines the dynamics 
of the neuron. A high time constant results in a slowly changing neuron state 
while decreasing it makes the neuron more responsive to current synaptic inputs 
approximating a reactive neuron with no internal state in the limit. 



3 The Simple “Reinforcement Learning” Task 

The first experiment is a basic setup which requires acquisition and storage of 
“knowledge”. A simulated Khepera robot (figure IP is placed in a rectangular 
arena with two potential reward-zone positions (figure E), a light bulb to the left 
and a black vertical stripe on the right. The robot has 6 trials to find out where 
the reward-zone is, go there and stay there. At the beginning, the position of the 
reward-zone (grey half-circle in figure Q is randomly chosen, either to the left 
(below the light bulb) or to the right (below the stripe), and remains the same 
for 3 consecutive trials. After 3 trials, the reward-zone is switched to the other 
end of the environment. At the beginning of each trial, the robot is randomly 
positioned within the center third of the dashed line shown in figure Q always 
facing the black stripe. In order to make the task simpler, the robot can only 
move back and forth along this line, i.e. it cannot rotate. This is realized by 
having one motor output neuron controlling the speed of both wheels of the 
robot. 

The reinforcement signal comes from a floor sensoiQ (figure 1(c)) which is on 
when the robot is inside a gray reward-zone and Oj(f otherwise. Notice that this 

^ This sensor can be easily be mounted on a standard Khepera robot by placing an 
additional IR sensor under the robot body and connecting it one of the A/D channels. 
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Fig. 2. The environment used in the simple reinforcement learning task. To the left 
there is a light bulb and to the right there is black stripe on the wall. A gray reward-zone 
is randomly placed in either end (here shown to the right). The robot is constrained 
to move along the dashed line always facing the black stripe on the wall. Below the 
environment the view from the linear camera is shown. 



information is a sensory input just like any other, in contrast to conventional 
reinforcement learning systems where the reward signal plays a special role in 
the architecture and in the learning algorithm [Q. 

3.1 Network Architectures and Genetic Encoding 

The neural network architecture is shown in figure 0 The network consists of 
5 fully interconnected neurons (4 hidden -|- 1 motor output) and 6 sensory re- 
ceptors. Every neuron has synaptic connections from all neurons and all sensory 
receptors. The sensory receptors are configured the following way (see also fig- 
ure 0: 

— 2 Light receptors: The robots infrared sensors in passive mode are used to 
measure the ambient light. Only the two sensors on the back of the robot 
are used. 

— 3 Visual receptors: The linear vision module has 64 equally spaced pho- 
toreceptors spanning a visual field of 36° (figure fl(b)). The visual field is 
divided into 3 sectors and the average pixel-value in each sector is passed to 
the corresponding visual receptor. 

— 1 Floor receptor: An infrared sensor in the center of the robot body pointing 
downwards (figure 1(c)) measures surface brightness. If the robot is inside 
the reward-zone the corresponding receptor is on (=1) and otherwise it’s off 
(=0). 

The value from each receptor is scaled into the range [0,1]. The activation 
of the motor output neuron in scaled linearly in the range [—10, 10] and used 
to set the speed of both wheels of the simulated robot. The parameters of the 
network are encoded in a bitstring genotype (figure Q top). Each neuron has 
13 encoded parameters: a time constant (r), a threshold (0), and 11 synaptic 
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Hidden 



Motor 



Fig. 3. Genetic encoding of the parameters for one neuron (top) and architecture of 
the neural network used in the simple reinforcement learning task. Genetic Encoding: 
Each neuron parameter is encoded using 5 bits. 6 is the bias, r is the time constant, and 
wi . . . Wn are the strengths of the incoming synapses to this neuron. Neural Architecture: 
The network consists of 5 neurons (4 hidden + 1 motor output) and 6 sensory receptors. 
Every neuron has synaptic connections from all neurons and all sensory receptors. 




strengths (wij). Each parameter is encoded linearly within it’s range using 5 
bits, giving a total genotype length of 325 bits. 



3.2 Experiments 

The experiments were carried out in a realistic simulation of the Khepera robot 
based on samplings of infrared sensor values 0, computing geometric projec- 
tions for linear camera inputs and adding 5% uniform noise to every value. A 
simple genetic algorithm with rank-based selection was used. A population of 100 
neural controllers was evolved for 100 generations. At every generation the best 
20 individuals made 5 copies each. One copy of the best individual remained un- 
changed (elitism). Single-point crossover with a 0.04 probability and bit-switch 
mutation with a 0.02 probability per bit was used. Every neural controller was 




Fig. 4. Configuration of the 6 sensory receptors used in the simple reinforcement learn- 
ing task. 
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Fig. 5. Simple Reinforcement Learning Task. Thick line shows best fitness and thin 
line shows population mean (both are averages over 10 replications of the experiment). 



tested for 3 epochs of 6 trials each. Each trial lasted 150 sensory-motor steps 
(corresponding to 15 seconds of simulated time). At the beginning of each epoch, 
the neural controller was re-initialized by setting the state of each neuron to 0. 
This means that the robot could potentially build up and store information in 
the dynamic neuron states between trials within the same epoch. The fitness of 
an individual was proportional to the amount of time spend on the reward-zones 
subtracted a penalty for spending time in only one of the two zones: 

Egr*/iW + /,W-l/.W-SWI ^ 

epochs X trials 4>er -epoch x steps jper -trial ’ 

where f± is the number of steps spend in the reward-zone in trials with reward 
to the left and /2 is the number of steps spent in the reward-zone in trials with 
reward to the right. Notice that if fi or /2 is zero in an epoch the total fitness will 
also be zero in that epoch. Without taking the absolute difference between /i 
and / 2 , evolved controllers found the sub-optimal solution of only accumulating 
fitness in one end of the environment. 

The experiment was repeated 10 times with different initializations of the 
computer’s pseudo-random number generator. The results of the experiments 
are shown in figure 0 The thick line shows best fitness and the thin line shows 
population mean, both are averages over 10 replications of the experiment. As 
can be seen from the graph only about 40 generations on average were required 
for the fitness values to reach a stable high level. 

The typical behavior of an evolved controller is visualized in figure 0 The 
x-position of the robot is plotted over time for an entire epoch of 6 trialsQ Before 



^ Each trial is shortened from 150 to 100 steps in figure EJbecause the position of the 
robot always stabilizes within this time-frame for this individual. 
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Fig. 6. Typical behavior of an evolved robot in the simple reinforcement learning task. 
Top: Robot x-position is plotted against time over 6 trials. Bottom: Environment lay- 
out. See text for description. 



the first step of each trial the robot is randomly placed within the center third of 
the x-axis. This evolved robot begins to move to the left where the reward-zone 
is positioned for the first 3 trials. After 3 trials the reward position is switched. 
The robot still moves to the left by default, but when it discovers that the floor 
sensor remains off, it reverses direction and moves right. For the remaining trials, 
it “remembers” that the reward-zone is to the right and moves there directly. 

4 The T-Maze Task 

In the second experiment the robot had to navigate the T-maze shown in figure Q 
The task for the robot was in principle the same as in the previous experiment, 
but because of the increased complexity of the environment the robot was now 
allowed to move both wheels independently. The reward-zone (black square in 
figure 0 could be positioned in either the left or the right arm of the maze. 
The position of the reward-zone stayed fixed during each epoch. The robot was 
tested for 4 epochs of 5 trials each - two epochs with the reward-zone in each 
arm of the maze As before the neural network is re-initialized between epochs 
but not between trials. The optimal behavior of the robot in this environment 
is to use the first trial of each epoch to locate and “remember” the position 
of the reward-zone, and thereafter move directly towards it for the remaining 
trials of the epoch. To put additional evolutionary pressure on this behavior, 
the number of available sensory-motor steps was 360 in the first trial of each 
epoch and only 180 in the remaining 4 trials. Given the size of the maze this 
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Fig. 7. The T-maze environment used in the second experiment. The reward-zone 
(black square) can be positioned in either the left arm (shown above) or in the right 
arm of the maze. 



^ ^ ^ 




Fig. 8. Configuration of the 5 sensory neurons used in the T-maze task. The eight IR 
proximity sensors are paired two by and the fifth input is from the floor sensor. 



means that the robot only had time the explore the whole maze during the first 
trial of each epoch. In addition a poison-zone (white square in figure 10(b)) was 
positioned opposite the reward-zone in the last 4 but not the first trial of each 
epoch. An individual was immediately killed if it stepped over the poison-zone. 
With these changes the fitness function could be simplified to the sum of trials an 
individual ends it’s life inside the reward-zone. Since each individual was tested 
for a total of 20 trials the maximal possible fitness was 20. Notice that there 
is no direct pressure on evolving fast moving robots given this fitness function. 
This is however compensated by the fact the the number of steps in each trial 
is limited and has been adjusted to fit the size of the environment. 

In this experiment, in addition to the floor sensor, only one sensory modality, 
the infra proximity sensors of the simulated robot, was used (see figure E)- The 
network architecture was similar to the one shown in figure El The network still 
contained 4 hidden neurons, but now had 2 motor output neurons and only 5 
sensory receptors, giving a total of 66 connection weights and a total genotype 
size of 390 bits. Because of the increased complexity of the task the population 
size was increased to 200, where 40 parents make 5 copies each. The elite size 
was set to 5 and the range of the time-constant r was changed to [0,50]. The 
evolution was run for 200 generations and replicated 10 times. 
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Fig. 9. T-Maze Task. Thick line shows best fitness and thin line shows population 
mean (both are averages over 10 replications of the experiment). 



The fitness results of the evolutionary runs on this T-maze experiment are 
shown in figure 0 The evolutionary process found individuals able to collect 
the maximal fitness of 20 in 6 out of the 10 replications of the experiment. The 
maximal fitness in the 4 remaining runs was around 16. The behavior of an 
individual from the final generation of one of the successful runs is shown in 
figure fTTTla.nd UTI The robot starts out in trial 1 of the first epoch (figure 10(a)) 
by exploring the maze until it locates the reward-zone where it stays the re- 
maining time of the trial. In the following trials (figure 10(b)), the robot is able 
to retain the “knowledge” gathered during trial 1 and always turns left at the 
T-junction in order to move towards and stay on the reward-zone. In the epochs 
with reward to the right the robot moves directly towards the reward-zone in 
trial 1 (figure 11(a)), since the default behavior of this individual is to turn right 
at the first T-junction after a re-initialization of the neural controller. For the 
remaining trials this successful behavior is repeated (figure 11(b)). 

4.1 Analysis 

In order to better understand how the evolved neural networks worked, some 
further analysis on an individual from one of the successful evolutions on the 
T-maze task was done. The neuron activities of the network was recorded over 
two epochs, one with the reward in each end. The plots of the first trial of each of 
these two epochs are shown in figure 12(a) and 12(b), both these trial lasting 360 
sensory-motor cycles as during evolution. The top four graphs shows the activity 
of the hidden neurons (HO to H3) and the next two shows the activity of the 
motor output neurons (MO and Ml). The output of the floor sensor (F) is added 
at the bottom for clarification. The activities vary between 0 and 1 as defined by 
the logistic activation function. The neuron activities of figure 12(a) correspond 
to the robot trace shown in figure 10(a) and figure 12(b) corresponds to trace 
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(a) Trial T. The robot explores the 
environment, and after some time 
locates and stays on the reward- 
zone. 




(b) Trials 2-5: For the remaining 
trials the robot exploits the “knowl- 
edge” gained in trial 1 and moves 
directly towards the reward-zone. 



Fig. 10. Robot traces of an epoch with the reward-zone to the left. 



shown in figure 11(a). In the first case (reward left) the robot searches the whole 
environment before if finds the reward-zone, whereas in the second case it finds 
it directly. This can be seen by comparing the plots of the floor sensor (F) which 
goes high when the robot enters the reward-zone. In order to better understand 
how the network retains “knowledge” between trials it is useful to compare the 
activity of the hidden neurons towards the end of each trial. 

This can tell something about how the information which the network seems 
to retain is captured. It can be seen that the dynamics of neurons HO, HI and H3 
are very similar in the two plots whereas the activity of H2 differs significantly. 
In figure 12(a) the activity of H2 falls almost linearly from about 0.5 to about 0 
across the trial. In figure 12(a), on the other hand, the graph of H2 is similar for 
the first 150 sensory-motor steps, where paths of the two robots are the same, 
but then starts to rise when the robot enters the reward-zone, and is about 1 at 
end of the trial. The state of H2 at the end of the trials was —1.27 in the first 
case and 2.11 in the second. It seems that this neuron play a main role in the 
memory capabilities of this controller. In order to further test this hypothesis, 
an additional set of tests was done. 51 epochs of the robot moving in the maze 
for 1 trial were performed. Every neuron state was set to 0 (as during evolution) 
in the beginning of each epoch except for neuron H2 which initial state value 
was increased by 0.1 for each epoch, starting at -2.5 and ending at -1-2.5. The 
behavior of the robot when encountering the T-junction was monitored during 
these epochs. The robot was able to successfully navigate the maze in every 
case, although a bit slower than before. It generally took a couple of seconds of 
adaptation by jiggering movements before the robot was at normal speed. The 
results showed that the robot turned left at then T-junction whenever the initial 
state value of H2 was set below —0.3, and it turned right every time when the 
value was above —0.1. This result shows as hypothesized above, that the overall 
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(a) Trial 1: The default beha,v- 
ior of this individual of turning 
right takes the robot directly to the 
reward-zone in trial 1. 




(b) Trials 2-5: For the remaining 
trials the robot repeats this behav- 
ior. 



Fig. 11. Robot traces from an epoch with the reward-zone to the right. 



behavior of the robot can be modulated by changing the state of single neuron 
which plays a crucial role in this network’s “memory capabilities” . 

5 Conclusion 

The experiments in this paper have shown that the approach of evolving con- 
tinuous time recurrent neural networks to display reinforcement learning-like 
abilities initiated by Yamauchi and Beer [0 can indeed be extended to robotics 
task where an embodied agent has to move around in an environment. The 
first experiment was on purpose kept simple in order to show the feasibility of 
the approach. The second experiment showed that similar results could be ob- 
tained in a more complex environment. Successful individuals were in both cases 
able acquire and retain “knowledge” by in trial 1 exploring the environment it 
happened to be situated in. This information was thereafter exploited in the fol- 
lowing trials where direct traces towards the reward-zone was seen. This kind of 
“learning” resembles biological learning in animals much more than more tradi- 
tional approaches to learning such as reinforcement learning |^, where an agent 
normally need a substantial number of learning trials. 

The analysis of an individual in T-maze experiment showed that one of the 
hidden neurons in the network was responsible for storing crucial “knowledge” 
and thus generating the learning-like behavior. This shows that, at least in this 
kind of environments investigated in this paper, it is not necessary for an suc- 
cessful agent to build a cognitive map of the environment. If this is also the 
case in more complex environments needs to be investigated. Current research 
focuses on extensions to double T-maze environments similar to those recently 
investigated in [0 on a series of delayed response tasks. Future work also includes 
transfers of the evolved behaviors to a physical Khepera robot. 
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(a) Activity of each neuron plotted 
over trail 1 of an epoch with reward 
to the left. 




(b) Activity of each neuron plotted 
over trail 1 of an epoch with reward 
to the right. 



Fig. 12. Neuron activities. 
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Abstract. This paper describes the application of genetic algorithms to evolve 
new spatial masks for non-linear image processing operations, which are ulti- 
mately to be implemented on evolvable hardware. The development environ- 
ment was custom-built to allow full control over the evolution process and en- 
able the importance of the evolution strategy (including the representation 
scheme, parameters and fitness function) to be investigated and understood. 
Results of applying the evolved mask to threshold real-world images are pro- 
vided and are shown to be an improvement on conventional image processing 
operations. The envisaged infrastructure for the evolvable hardware is also 
considered, and the implementation of the image processing operations dis- 
cussed. 



1 Introduction 

The use of genetic algorithms in image processing is widespread and diverse, but not 
particularly well understood. The aim of this paper is to investigate the application of 
a genetic algorithm (GA) to a simple image processing operation so that the evolu- 
tionary strategy and the parameters that govern its operation may be studied in detail. 

A spatial mask is evolved to perform the adaptive thresholding of gray scale im- 
ages. A custom-built environment for the development of the GA is described which 
gives full control over the representation scheme, the GA, its parameters and fitness 
function. Results are provided illustrating the evolving of a new spatial mask and its 
application to real-world images. 

The ultimate aim of this work is to implement the algorithms within an evolvable 
hardware system which will provide maximum flexibility and speed in the solution of 
demanding image processing problems. 



1.1 Evolutionary Algorithms in Image Processing 

A broad and disparate range of examples of the use of genetic algorithms and genetic 
programs in image processing may be found in the literature. Notable examples are 
the use of genetic programs in the segmentation of medical resonance imaging scans 
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[1], a genetic program that performs edge detection on one-dimensional signals [2] 
and the evolution of genetic programs to detect edges in petrographic images [3]. 

The work described in this paper is an extension to that first undertaken by Holl- 
ingworth and the authors [4] . This was concerned with the evolution of spatial masks 
to detect edges within gray scale images, which subsequently, could be implemented 
within an evolvable hardware environment. The latter is described below. 



1.2 Evolvable Hardware Architecture to Support Image Processing 

The evolvable hardware architecture proposed by the authors for implementation of 
the evolutionary algorithms is shown in Figure 1 . Each processing block in the net- 
work is preloaded with a single pixel from the image to be processed. 



Processing 
blocks 
containing 
image pixels 




Fig. 1. Evolvable hardware architecture for image processing operations 



The common processing block is shown in Figure 2. The main components of this 
processing block are: 

• the particular image pixel under consideration; 

• the values of the neighboring eight pixels in the image; 

• the logic function to be applied to these (defined by the genotype string), and 

• the resulting output pixel that will replace that under consideration. 
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Fig. 2. Single pixel processing block 

The genotype string is used to ‘program’ the logic function, which then calculates the 
value of the output pixel using the values of the input and neighbouring pixels. 



1.3 Evolving New Image Processing Operations 

The new image processing operations reported in this paper were evolved using ge- 
netic algorithms in a software environment, but were specified such that a future 
hardware implementation will be possible. The image processing operation chosen to 
investigate was dynamic or adaptive thresholding [7] as this is a simple yet clearly 
non-linear local neighbourhood operation that is not traditionally implemented with a 
single spatial mask. The conventional method is to apply a different threshold to each 
local area within the image, based upon some statistical function of the values of the 
pixels contained therein. The method chosen for comparison in this investigation is 
the mean-C procedure. This simply calculates a threshold value by subtracting a 
constant C from the mean pixel value for the local area of the image currently being 
processed. The aim of evolving a single spatial mask to perform adaptive threshold- 
ing may appear illogical but provides an interesting platform in which to investigate 
the evolution of such image processing operators. 



2 Method 

Although purpose written environments such as GALib [5] and Gpc-H- [6] are avail- 
able for the development of genetic algorithms and genetic programs respectively, a 
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specially written software environment was developed to enable full flexibility and 
control over the design of the evolutionary algorithms (EA). For this work a standard 
approach to the implementation of a genetic algorithm (GA) was employed. This 
requires a population of possible solutions to the problem to be created, which 
evolves over a number of generations. The fitness of each member of the population 
is calculated and will determine whether it survives to the next generation or dies. To 
help generate diversity in the population operations to simulate meiosis, crossover 
(the sharing of genetic information between parents) and mutation are often imple- 
mented. The main components of the GA, such as the representation of the EA, the 
parameters governing its operation and the evaluation of a fitness function are now 
considered. 



2.1 Representation 

The basic structure of an image processing mask or window operator is simply a 
matrix of weights which are applied to a number of neigbouring pixels within the 
image on a repetitive basis until the entire image has been processed. A typical 3x3 
pixel mask is shown in Figure 3. 



W1 


W2 


W3 


W4 


W5 


W6 


W7 


W8 


W9 



Fig.3. Typical image processing mask operator 

Typically, the weights W1-W9 are applied to the respective pixel values and 
summed. The resulting value is then used to replace that of the pixel of interest, usu- 
ally the centre pixel. Other manipulations of the pixel element are also common, as 
are larger masks such as 5x5 and 7x7 pixels. 

The representation of the GA used in the evolution of the new image processing 
operation is simply the weights W1-W9 arranged in a single binary string as shown in 
Figure 4. 




Fig.4. Encoding of a 3x3 spatial mask to a 36-bit genome string 
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This has the distinct advantages of simplicity in application of the evolutionary 
process and implementation on the proposed evolvable hardware platform. 



2.2 Selection Method 

Standard implementations of both rank and tournament selection have been imple- 
mented to select individuals for the next generation. 

In tournament selection, a specified number of individuals from the population are 
chosen randomly and the fittest of these is used in the next generation. 

Rank selection requires the members of the population to be arranged in an array 
in order of fitness. Element 0 is the fittest member and element P-1 (P is the popula- 
tion size) is the least fit. A random number r is generated between 0 and land used 
within the following formula to generate the position of a ‘fit’ individual in the popu- 
lation array: 

/ = (int)Pxr'’“^ 

bias is the selection strength that varies from 1 upwards such that bias=l means there 
is an equal chance of selecting every member of the population. Increasing bias 
gives strong biasing towards the fittest individual, i is the position in the array of the 
chosen individual. 

2.3 Mutation Method 

Three methods of mutation have been implemented to modify individuals for the next 
generation: unbalanced mutation, balanced mutation and create new population. 

Unbalanced mutation simply loops through each node of the genome and depend- 
ing on the predetermined mutation rate, replaces the node with a new random value, 
within a set range. In this implementation, the genome is the binary string containing 
the spatial mask weights, and each node relates to an individual weight within this. 

Balanced mutation is similar to that of unbalanced mutation but an algorithm is 
employed to ensure the sum of all the nodes within the genome is preserved. 

Create new population simply replaces the entire genome (the binary string) with a 
new, randomly generated one. 

In the custom software environment developed for this work, it is possible to em- 
ploy all three mutation methods randomly in the generation of new members of the 
population. 



2.4 Crossover Method 

Crossover is more controlled than mutation for modifying individuals for the next 
generation as it replaces a node of the genome of one individual with that of another, 
and not purely a random value. Two methods of crossover have been implemented 
here: in-place and shujfle positions. 
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In-place ensures that the nodes swapped between the parent genomes are kept in 
the same relative positions. 

Shuffle positions, however, allows nodes to he placed without regard to their rela- 
tive positions in the parents’ genomes. 

As with the mutation facility, within the current evolutionary environment it is 
possible to employ both crossover methods on a random basis. 



2.5 Fitness Function 

The role of the fitness function is to evaluate the performance of each individual 
within the population. In the current context of evolving image processing mask 
operators, the fitness is determined by applying the evolved mask operator to one or 
more images and determining the success with which the desired operation has been 
achieved. 

To facilitate an objective measure of fitness a number of good quality images are 
selected and then purposely degraded to provide test images to which the evolved 
mask operator can be applied. The resulting images can then be directly compared 
with the respective original images and a measure of fitness calculated based on the 
number of falsely identified white and black pixels as follows: 

fitness = bias*falseBlackPixels ■+ (1 -bias )*falseWhite Pixels 

where bias is a value ranging from 0 to 1 and is used to weight the fitness according 
to the desire to deter the identification of false black pixels, false white pixels, or with 
a value of 0.5 give equal regard. 

The final fitness value is simply the sum of fitness values for each test image em- 
ployed. 



3 Results 

New spatial masks for adaptive thresholding were evolved using a set of eight test 
images shown in Figure 5. These 64x64 pixel grayscale images were generated by 
degrading the set of black and white images shown in Figure 6. The latter are subse- 
quently used as target images in evaluating the fitness of each individual of the popu- 
lation. 

The set of parameters for the GA employed to evolve the masks are listed in Ta- 
ble 1. 

The range of permissible values for each node in the genotype (which equates to 
the weight in the resulting spatial mask) was of particular interest and a comparison 
of five different ranges was considered: H-/-10, H-/-50, H-/-100, H-/-1000 and H-/-10000. 
The evolved spatial masks and the result of applying each mask to the sixth test im- 
age is shown in Figure 7. The relative effect in terms of the best fitness function is 
shown in Table 2. 
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Fig. 5. Test images used in evolving adaptive threshold spatial masks 



T“lQ9iO 

eR'^raSM 

N;h7SB 


VPldwp 

£2&uX,? 

IFbnjx4C 


%vWJUy 

okt8,YB 

0@qgfE: 


6D[s(fCj) 

ZjA’SU) 

]3H-D 


%wUT2 

kJD'Ed9 

•g9vxlRh 


o.uzxMS 

5bdn\jK6 

OWD?K 


YSDmozt 

yU?mp;m 

[49VxY 


OtSvPJ” 

OQoJ’“2 

?jWZ“W 



Fig. 6. Target images used in evaluating the fitness of the evolved spatial masks 



Table 1. Parameters employed for first evolution of adaptive threshold masks 



Parameter 


Value 


Dimension of evolved mask 


3x3 


Population size 


100 


Generations evolved 


1000 


Selection method 


Tournament 


Mutation method 


All 


Mutation rate 


0.2 


Crossover method 


All 


Crossover rate 


0.95 
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Fig. 7. Effect of node range value on evolved spatial masks and subsequent application to test 
image 6 



Table 2. Effect of node range on average and best fitness 



Node range 
(+/-) 


Best fitness 
(%) 


10 


65.16 


50 


72.50 


100 


74.48 


1000 


77.17 


10000 


77.60 



The evolved masks were then applied to real world images which are considered 
demanding for successful adaptive thresholding using conventional approaches. The 
original images are shown in Figure 8 and the result of conventional global and adap- 
tive thresholding (using the mean-C technique) are shown in Figures 9 and 10 respec- 
tively. 




Fig. 8. Original ‘real-world’ images 
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75% to generated less false black pixels. The results of this can be seen in Figure 12 
and are arguably an improvement on the conventional adaptive threshold and the 
previously evolved mask. 
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Fig. 12. Evolved 3x3 mask (75% bias) applied to images 



Finally, evolution of a 5x5 pixel masks was investigated, again with a node range 
of +/-100 and a fitness function bias of 75% (other parameters remained the same as 
listed in Table 1). The evolved mask is shown in Figure 13 and the result of applying 
the evolved mask to the ‘real-world’ images is shown in Figure 15. For comparison, 
the result of applying a conventional 5x5 mean-C adaptive threshold mask is shown 
in Figure 14. It is again arguable that this evolved mask is an improvement on both 
the conventional adaptive thresholding and previously considered evolved masks. 

For information, the evolution of the 5x5 mask in terms of best and average fitness 
per generation is shown in Figure 16. 
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Fig. 13. Evolved 5x5 adaptive threshold mask 



4 Conclusions 

The aim of this paper is to demonstrate that simple, yet effective, image processing 
spatial mask operators may be evolved, and that these may be easily implemented on 
an evolvable hardware platform. The adaptive thresholding mask evolved is compa- 
rable, if not an improvement, on a conventional mean-C thresholding mask and far 
superior to conventional global thresholding operation. The evolvable hardware plat- 
form considered for hosting these masks, which is currently under development, will 
enable further masks to be evolved in real-time to provide unique solutions to de- 
manding image processing applications. 
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Fig. 15. Evolved 5x5 mask (75% bias) applied to images 




Generation 

Fig. 16. Evolution of 5x5 spatial mask: best and average fitness per generation 
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Abstract. This article describes an EPGA (Eield Programmable Gate Array) 
based hardware implementation of a genetic controller to be applied for the 
evolution of an Artificial Neural Network (ANN) [3] for collision-free naviga- 
tion task of mobile robots. The adaptive nature of ANN enables it to train itself 
while the robot interacts with the environment. In addition to online training, 
the genetic evolution in neuron bits will be examined in an experiment to un- 
derstand the interaction between evolution and lifetime adaptation of the ANN. 
The concept of chromosome for navigation task, design techniques of various 
blocks inside the GA controller will be elaborately described here. 



1 Introduction 

Genetic Algorithms (GAs) introduced by John Holland [6] are adaptive search strate- 
gies based on a highly abstract model of biological evolution to find a possible solu- 
tion in a given problem space. The genetic process can evolve the neural architecture 
of an ANN. It starts with a population of robots with random configuration bits 
known as chromosomes. Genetically-inspired operators like crossover and mutation 
are used to introduce new individuals into the population [6]. To realize the GA op- 
eration inside an FPGA, a random generator, a crossover block, a partner selector and 
a fitness block were modeled in VHDL (Very High Speed Hardware Description 
Language). GA based approach is the basis for most evolutionary robotics applica- 
tions [6, 10]. Neural networks can easily exploit various form of learning during life- 
time and this learning process may help and speed up the evolutionary process [1]. 
The neural network will undergo change as a result of two forces: learning during the 
“lifetime” of a network and evolutionary change over the course of several “genera- 
tions” of networks. It will be investigated how learning at the individual level can 
have an influence on evolution at the population level. In a sensory-motor navigation 
task the inputs to an ANN [3] are object patterns from the sensors which tell the ANN 
about obstacle-scenarios around the robot. Depending on the pattern a decision is 
made by the ANN and appropriate commands are sent to the motors for proper 
manouevre. Each robot is evaluated according to an objective fitness function derived 
from its ability to avoid collisions. The larger the fitness the more chance the robot 
has to produce offspring in the next generation of population. Over a number of gen- 
erations, the fitness of the population increases and successful architectures are cre- 
ated. The use of an FPGA can allow high speed parallel processing [2, 9, 13] of the 
neural network and the genetic controller. VHDL was used for modeling devices as it 
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allows simulation over many different levels of abstraction and is widely accepted as 
a standard for modelling hardware [12, 11, 7]. The design package available for the 
project, MAX+Plus II version 10. 1, converts the VHDL design file into a bitstream 
file which configures the FPGA. The ANN that will be evolved was implemented by 
the same authors and can be found in [3]. 



2 Chromosome 

The heart of the control structure of the robot is its Chromosome. This consists of an 
array of bits of storage. These bits are grouped together to form neurons. The size of 
the neurons or the number of bits per neuron, neurons per class and number of classes 
in a network actually define the architecture of the network. Neuron contents or the 
bits of the neurons define the behaviour of the network. All these behavioural bits and 
network architecture are stored in the chromosome in binary values. In a typical de- 
sign [3] for a network with 3 classes, 5 neurons per class and 8 bits per neurons the 
number of bits in a chromosome will be 120. The pattern of these 120 bits in chromo- 
some will define the behaviour of the ANN. The next part of the chromosome is the 
configuration bits for sensor modules. For each sensor two bits are required, so for 8 
sensors the total number of configuration bits are 16. Two bits of each sensor will 
determine the presence of that sensor in the next generation. The different combina- 
tions of bits could be “00” ,“01”, “10” and “11”. The control strategy was such that 
the sensor would be enabled only when the genetic pair is “00”, for other three cases 
it is disabled. Thus the probability of the phenotype of having that sensor to the next 
generation is 25%. This approach is called the “dominance approach” [8] where all 
features are recessive [4]. 
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Fig. 1. (a) Chromosome in RAM (b) Identification number and Fitness value with the chromo- 
some 



The last three bits in the chromosome are reserved for the speed control of motors. 
3 bits give 8 levels of speed. In a group of robots a number is associated with each 
robot to identify it from other robots. This identification number can be stored in the 
RAM at one side of the chromosome. The length of the identification number depends 
on the number of robots. For three or four robots the length is two bits, whereas for 
five to eight robots the length should be three bits. In general for 2"'‘ H- 7 to 2“ number 
of robots the length of the identification number will be n bits. Fitness value of the 
robot can be stored in the RAM at other side of the chromosome as shown in the 
Figure 1. Thus the fitness which measures the robot’s performance will be written to 
this particular location of the RAM. The length of the fitness value could be as long as 
17 bits as used in the designed system. 
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3 Partner Selector 

Among a group of mobile robots partner selection criteria could be as simple as 
choosing the robot with highest fitness value to mate with all other robots. In other 
ways it could be more complex like the roulette- wheel technique [10]. Due to the 
restriction in number of cells in the FPGA and to avoid the complexity the simplest 
approach should be preferred. In a group of robots each robot could be identified by a 
certain number. For example in a small group of three robots it could be robot 1, robot 
2 and robot 3. A similar block as shown in the Figure 2 could be modelled in VHDL 
to select the best robot to breed. A comparator in this block compares the fitness val- 
ues and singles out the highest one. In the RAM three different zones can be used to 
store three different robot’s information. Information consists of a robot’s identifica- 
tion number, chromosome and fitness value. As shown in Figure 2 in the top zone the 
robot’s own information is kept. In the middle and bottom zone information from the 
two other robots can be stored. The information with a greater fitness value between 
the middle and bottom zone is always swapped to the middle. When the robot re- 
ceives the information data from another robot it stores the data in the bottom zone of 
the RAM. This means that previous data in the bottom zone can be overwritten by the 
new data. Upon receiving the next robot’s data again the fitness value of the bottom 
zone is compared with the fitness value in the middle zone and if it is greater, then all 
the information stored in the bottom zone are swapped to the middle zone. After 
swapping the “ready _to_receive” signal goes to HIGH to indicate that this robot is 
now ready to receive data from another robot. If this is the turn of this robot to send 
its own information, then the ‘ready_to_send’ signal goes to HIGH instead of the 
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‘ready_to_receive’ . When all the ‘ready _to_receive’ signals from all other robots are 
HIGH then it sends its own information data from top zone of the RAM. When all the 
robots’ information are received and compared and swapped to the middle zone, the 
middle zone contains the information data of the robot which has the highest fitness 
value. At the beginning of the partner selection process the robot’s own information 
data is copied from the first zone to the middle, so if its own fitness is the highest then 
still it will be preserved in the middle zone as the comparator always keeps the data 
with higher fitness value in the middle. After the selection process the ‘part- 
ner_selected’ goes to HIGH which triggers the crossover. To read or write in the 
RAM the ‘ram_state’ signal is sent to the RAM State Controller (RSC). Depending on 
the ‘ram_state’ signal ‘we’, ‘oe’ and ‘rce’ signals are set by the RSC. Details of the 
RSC can be found in [3]. 



4 Crossover 

The Partner selector block selects the best chromosome, that with the highest fitness 
value, and saves the chromosome in the middle zone of the RAM (Figure 2). The 
crossover block now copies a portion of the best chromosome and then pastes it on 
the robot’s own chromosome stored in the top zone. So some characteristics of the 
best chromosome will be passed to the population in the next generation. A logic 
HIGH on the ‘partner_selected’ signal enables the crossover block and swapping of 
bits between best chromosome and the robot’s own chromosome starts. First a point 
of crossover is selected randomly. The modelled crossover block is shown in the 
Figure 3. The block gets a random number input within the length of the chromosome 
to start the crossing over at a random point. If the chromosome has a length of 139 
bits then the random number could be between 1 to 138. For example if the number is 
136 (as shown in simulation in Figure 4) then the four bits from 136 to 139 of best 
chromosome will be swapped sequentially to the same locations of the robot’s own 
chromosome. In the Figure 3 the part that has to be swapped is identified by the 
shaded area. For accessing the RAM proper RAM control signals are defined by the 
‘ram_state’ signal to the Ram State Controller (RSC) . This signal is also shown in 
simulation as 0 (for idle), 1 (for writing) or 2 (for reading). 

Once the crossover is completed the ‘crossdone’ signal is set to HIGH for one 
clock period which disables the crossover block until the ‘partner_selected’ signal 
again goes to HIGH. The simulation of this block shows proper operation (Figure 4). 
Random number was 136 in the simulation and while crossing over, it first reads one 
location of a chromosome and then writes it to the same location of the other chromo- 
some and then it copies and pastes the next location and this way it sequentially pro- 
gresses until the last bit of the chromosome is moved. 



5 Fitness Block 

A reward-punishment scheme was applied while evaluating fitness. A robot’s fitness 
is evaluated while a generation is running. The fitness function measures the perform- 
ance of the robot with respect to the desired behaviour and task. For the obstacle 
avoidance problem a simple rule can be applied: fitness will be increased (rewarding) 
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each time the robot successfully avoids any obstacle. Its fitness will be decreased 
(punishing) each time it collides. For example a fitness function can be programmed 
like this: 

1 . Fitness is increased by one every second the robot moves avoiding collision, 

2. Fitness is decreased by three in every operating cycle if the robot has a collision , 

3. If the robot starts turning for more than five seconds then the fitness is again de- 
creased by two. 
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Fig. 3. Crossover Block 




Fig. 4. Simulation of Crossover Block 

Similarly more sub functions can be added to access the performance of the robot. In 
the above example the first function encourages the exploration and second and third 
functions punish the robot to discourage collision and spinning. A fitness block was 
modelled in VHDL to calculate the fitness value. This calculated fitness value is 
stored in a 17-bits register which can be called as fitness register. The robot life starts 
with a sufficient fitness value such that even if the robot collides all of its life time, 
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the fitness number comes out at the end of the life should not be less than zero. So for 
a robot with a life time of five minutes the starting point could be in the range of 
90,000. It loses 3 points for every collision, loses 2 points if it starts turning for more 
than five seconds and in every second it gains one point if it moves forward. These 
timings were measured by a counter. Signals from microswitches (msl to ms8) are 
fed into this block to detect collisions. To know the direction of the motor the direc- 
tion command (m_dirc in Figure 6) from the motor control block are also fed into this 
block. Once a generation is finished the calculated fitness value is written in a particu- 
lar block of the RAM. The state (read, write, idle) of the RAM is determined by the 
‘ram_state’ signal sent to the RAM State Controller (RSC). Fitness block has three 
states: Initialisation, Evaluate, Writing. Two control signals (fwenb, rset) actually 
define the state of this block. 
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Fig. 5. Data Swapping in Crossover 




Fig. 6. Fitness Block 

Simulation results of this block are shown in Figure 7. The affect of collision and 
spinning on fitness value is shown in the figure. When the ‘m_dirc’ is ‘OF it indicates 
that the robot is spinning and the ‘d’ variable starts counting up (Figure 7) and when it 
reaches a value of 500 (which indicates 5 sec when a clock with a period of 10ms is 
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supplied) then 2 is subtracted from the fitness value. So in the Figure 7 at point ‘B’ 
value of the ‘regfit’ changes from 90001 to 89999. The register named ‘regfit’ was 
used to store the fitness value. When the fitness block is initialised the ‘regfit’ starts 
with a value 90000. At the point ‘A’ in the simulation a 1 is added with 90000 after 
the robot moves forward successfully for 1 second. This 1 second is determined by a 
variable ‘c’ when it counts to 100 with a lOOHz (10ms period) clock input. At ‘C’ in 
the simulation the regfit is again initialised to 90000 as the control inputs are rset = 1 
and fenb= ‘O’. At ‘D’ a collision is detected and it continues for a while. Due to this 
collision fitness value is decreased by 3 in every 10 ms, so if a collision stays for 1 sec 
then 300 will be subtracted. This simulation is just an example of a fitness block. 
Other fitness functions can be added in this block. 
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Fig. 7. Simulation of Fitness Block 



6 Training on Genetic Evolution 

The objective of this experiment was to find out how the training in robot’s life time 
could effect the evolution of the neural network. The evolution was realised among 
three robots. In the typical system the partner selection and the crossover operation of 
the genetic process were realized inside the PC, but the fitness value was calculated 
inside the FPGA. At the end of each generation the fitness value and the neuron bits 
were transferred from the FPGA to the PC and the PC then took over the control of 
the rest of the genetic control. Once the new offspring was created, the neuron bits 
were downloaded to the robot’s RAM. To save design and construction time proto- 
type boards (with Altera’ s FLEX EPF8452ALC84-4 chip) were used to house the 
robot’s controller modelled in VHDL. Each prototype board has a limited number of 
I/O pins which led us to employ three boards to house the complete controller. The 
board containing the ANN was 80% full using 269 logic cells of Altera’ s 
EPE8452ALC84-4 chip and the board containing the fitness block was 66% full using 
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224 logic cells of a similar FPGA chip. The third FPGA was carrying the sensor and 
motor control blocks and it was only 30% full utilizing 102 logic cells of the FLEX 
chip. As one set of FPGA boards were designed, offspring were downloaded one by 
one. Two distinct evolution were examined: trained and untrained. In trained evolu- 
tion the robot was trained in every generation. The untrained evolution was purely 
genetic without training. An environment was created with obstacles and walls where 
each robot was evaluated one by one. The robot’s performance was measured by a 
fitness function which was as follows: 

1. Each generation starts with 90000 points. 

2. Reward: Fitness is increased by 300 points in every 1 second robot moves forward 
without any collision. 

3. Punishment: Fitness is decreased by 3500 points in every 10 seconds robot is turn- 
ing left or right. 

4. Punishment: Fitness is decreased by 5 points in every 10.95 milliseconds if robot 
has a collision. 

Different points in the fitness function were carefully chosen so that it gave more 
or less the same weight for reward and punishment. Though it seems that only 5 is 
subtracted for a collision but actually a huge number will be subtracted if a collision 
stays for a while. This is because 5 is subtracted in every 10.95 ms (operating cycle) 
so if a collision stays for 10 second then accumulated subtracted number will be 4550. 
Similarly if the robot moves forward for 10 seconds without collision then the fitness 
value can increase by up to 3000 points. The settings of the experiment are summa- 
rized in Table 1. 



Table 1. Summary of Experiment Settings 



Parameter 


Definition 


ANN Controller for 


ANN in [3]; 3 classes-5 neurons per 


Trained evolution 


class-8 bits per neuron 


ANN Controller for 


ANN in [3]; 3 classes-5 neurons per 


Untrained evolution 


class-8 bits per neuron 


Initial Fitness Value 


90,000 


Generation time 


2 mins 


Crossover 


One point 


Mutation 


0% 


Training Algorithm 


As described in [3] 


Initial Neural Contents 


Random 


Motor Speed 


Fixed at maximum 


Sensors Enabled 


3 sensors on the left, right and front 
were enabled. 



6.1 Discussion 

This experiment demonstrates that the trained evolution converges to the optimum 
solution much faster than the untrained evolution. Both evolutions were started with 
the same random chromosomes and after five generations all three robots in the 
trained evolution achieved higher fitness values than the untrained robots of the same 
generation. A chromosome which is biased in one generation can be a very good 
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Fig. 8. Robot’s Fitness in Trained and Untrained Evolution 
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Fig. 9. Higher Fitness in Trained Evolution 



chromosome in the next generation. A sharp rise of the fitness value of robotl from 
generation-3 to generation-4 explains this phenomenon. In generation-3 robot2 had 
the best chromosome, highest fitness value, and at the end of this generation the 
crossover between the chromosomes of robotl and robot3 improves the biased chro- 
mosome of robotl to a very good chromosome in generation-4. After that robotl 
dominated and it also helped robot3 to improve its chromosome abruptly from 
generation-4 to generation-5 and this explains the sharp rise of fitness value of robot3 
from generation-4 to generation-5 in Figure 8. So when training is applied there is 
always a good possibility of having a robot with a very good chromosome at a much 
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earlier stage than in untrained evolution. Once a good robot comes out it will help to 
improve the chromosomes of other robots in successive generations. In untrained 
evolution the probability of getting a good robot at early stage is much less. Floreano 
and Mondada [5] has described that the Khepera robot genetically learns to navigate 
and avoid obstacles in less than 100 generations and in around 50th generation the 
best individuals exhibit a near to optimal behaviour. But the current experiment shows 
that a simple training algorithm can reasonably drop down the number of generations 
to get the control structure for well navigated robots. There was one biased action 
found in this experiment that even could not be corrected by either training or evolu- 
tion. When there were obstacles both on the left and the right all the robots were al- 
ways turning to the left instead of going forward. This could be corrected by muta- 
tion. Mutation flips the bits from 1 to Os in several random positions of the 
chromosome. So any biased actions due to many Is in a particular block of the chro- 
mosome can be gradually changed by mutation in a number of generations. Alterna- 
tively, when the population is large the probability of getting robots with good chro- 
mosomes increases. So chance of getting well navigated robots in a large population 
is always high. Well navigated robots can improve the robots with bad chromosomes 
by evolutionary process even without mutation. Another point to be noted in this 
experiment is the abrupt fall of fitness value of robots from generation- 1 to genera- 
tion-2. This was the effect of training in unexpected situations like facing an obstacle 
below the height of the sensors. The unexpected situations were created on purpose 
by hand to find out how the training algorithm works in unexpected situations where 
the object could be too short to be detected by the photodiodes. So upon a collision, 
the supervisor circuit assumes that there is an object although the detector fails to 
detect it. This gives wrong address information at the inputs of the neural network and 
thus the training data will be written in the wrong locations of the ANN. Thus the 
robot starts behaving strange after the training. Because of this strange behaviour 
fitness value of robot 3 fell down sharply at generation-2 in the trained evolution. Still 
through the evolutionary process with the help of other good robots the robots again 
got back a better fitness value at generation-5. It demonstrates that if we disturb the 
system by pulling it away from its stable state, it will get back to the stable state by 
the evolutionary process. 



7 Conclusion 

The experiment in this article explains that the learning process in robot’s life time 
can help to speed up the evolutionary process. It also demonstrated that the evolution- 
ary process helped the system to get back to its stable state even though the system 
was disturbed. This experiment does not involve the collective behaviours of the 
robots. A future work could be to constitute a fully distributed control systems em- 
bodied in FPGAs for more than one robot at a time. If the controller is entirely decen- 
tralized it will be inherently scalable to large numbers of robots in a shared task envi- 
ronment. The current designed controller will definitely provide an appropriate 
substrate for this and will be an interesting platform for the future work in collective 
robotics and artificial life. 
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Abstract. Although hardware evolution is becoming a more popular 
topic of research, the main focus of this research tends to be with re- 
configuring electronic circuits using evolutionary techniques. Taking a 
step back, my research looks at some of the problems of configuring 
autonomous, mobile systems for varying goals and environments. Con- 
centrating on optical sensors, I am hoping to show that evolving the 
placement of sensors on the surface of the entity and the frequencies of 
light these sensors respond to will improve the entity’s performance in 
the environment. 

This paper discusses some of the issues of working outside of simulation 
and presents a hardware platform I consider solves many of these to 
enable me to carry out the research described above. Unlike other mobile 
solutions, this platform was designed to be cheap, work in multiple entity 
environments and cope with large numbers of sensors (in my research, 
close to 200 sensors per entity) while still being extensible. This paper 
also covers issues such as processing power, environment management 
and entity interactions. 



1 Introduction 

A view slowly growing in popularity among evolutionary hardware researchers is 
that the richer the medium in which you can evolve solutions, the more effective 
the evolutionary approach and the ‘better’ (always an experimentally dependent 
measure) the resulting solutions |2|. This has lead to research which tries to 
exploit the inherently complex properties of physical mediums such as doped 
silicon (using FPGAs and ASIGs), radiation damaged silicon and more exotic 
materials such as liquid crystals and complex polymers m However, very 
little work has been done on a more macro level, working in real (although not 
necessarily cluttered) environments with real hardware. My research intends to 
focus on this larger-scale environment and hope to answer many of the questions 
surrounding the potential benefits of allowing hardware to configure portions of 
itself in its native working space. 

The research will focus on the placement and sensitivity of light sensors on 
mobile robots interacting with each other and a real environment. Goals for the 
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robots will be based on finding ‘Food Cubes’ which can emit light of different 
wavelengths and mating by bumping into other robots. Any hardware platform 
for this research needs to meet a number of criteria and this paper presents a new 
platform which meets these needs and is flexible enough to be used by others 
both beyond the scope of my research and in entirely different research areas. 
In section three the criteria are described and in section four the new design is 
described in more detail, showing how the criteria have been met. Finally, in 
section five, some of the restrictions of the platform are covered with possible 
solutions mentioned for completeness (although they will not be implemented 
in the current platform). First, however, section two takes a closer look at why 
real world experiments can be beneficial as well as some of their disadvantages. 

2 Simulation vs Real World Experiments 

Carrying out any form of research using simulations has a large number of ben- 
efits. The researcher is lord and master of the environment they are working in: 
capable of stopping and rewinding time, moving objects around and even chang- 
ing the laws of physics at a whim. It is a hugely attractive way to work and 
there is much that can be gained from working with them. However, simulations 
should not be treated as identical to the real world and, if you want real-world 
experimental results, relying on a simulations to provide them is fraught with 
dangers. 

When researching robotics and evolutionary hardware, many researchers do 
not decide to use simulations because of the benefits they give, but more because 
it would be more difficult and expensive to do the same experiments in the real 
world. A simulated environment allows more experiments to be run at once, 
is often cheaper and allows certain tricks and ‘cheats’ that you simply could 
not apply in the real world (such as stopping time so you can calculate reward 
and fitness values for your population). The problems start with how good the 
simulation is. Of course, if the research is not really bothered about the real 
world, a simulation is the best way to carry out the experiments. But as soon as 
you try to evolve an entity in simulation and then transfer that entity to the real 
world, building the simulated environment suddenly becomes much more difficult 
[0. Evolutionary methods are adept at finding small advantages in unexpected 
places and using information from the environment to great advantage. Any 
simulation would have to model the real environment very closely in order to 1) 
allow the evolutionary processes to find good solutions that take full advantage 
of the real environment and 2) find solutions that don’t rely on quirks of the 
simulated environment. In a simulated environment designed for robots with 
only light sensors (for instance) reflection, refraction, shadows, surface textures 
and imperfections in the environment may all have huge information content 
that the evolutionary processes could make use of. 

Another problem with simulated environments is noise. Real environments 
are very noisy while simulated environments are not. To overcome this, many 
simulations introduce white (random) noise into sensor values and movements 
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within the environment. In the real world, there is no such thing as white noise 
- that is, noise with zero information content. Sensor values may fluctuate and 
movement may not be accurate but these are not random. They are caused by 
something either inside or outside the environment. Slippery or bumpy floors 
could help with location identification. Sensor value interference due to noise 
from another robot’s motors could help a robot And a mate. Even external noise 
could help in navigation because it is unlikely to be uniformly distributed. Over 
time, some types of noise will also change. Again, this is very unlikely to be 
random and could also be beneficial. 

3 Why Design a New Platform? 

Before committing to the time and expense of designing a new platform, the 
readily available platforms were considered. However, as is described in more 
detail below, the requirements for the platform are quite strict and none of the 
available platforms could meet them all: 

Multiple Entities. The research the platform was to be used for is based on 
an evolutionary approach with fitness evaluation being implicit in the ability 
of the entity to And food and mates. For this, multiple robots would need to 
coexist within the same environment. This immediately restricts the methods 
of communication between robots, their environment and any housekeeping sys- 
tems. Any technologies implemented would have to be wire free and a number 
of potentials were considered. Ease of implementation, reliable connection, low 
cost and reasonable data transmission were all prerequisites. 

Processing Power. Each of the robots is a complete unit and needs to be 
able to run a control system to survive in its environment. Control systems to 
be tested in the research include neural networks (static and evolved), hand- 
programmed systems and others. For this, quite considerable processing capa- 
bilities would be needed. On top of this, overall control of the experiments was 
also required. This overall control system will centrally determine, among other 
things, which potential offspring to initiate and when each entity dies as well as 
which inactive robot to use next for a new entity (based on battery level). 

Expandable Design. The research demands a design that can cope with very 
large numbers of analogue light sensors. To maintain the usefulness of the robots, 
the design should also be flexible enough to accommodate other sensors of both 
a digital and analogue nature as well as allow some expansion facilities for co- 
processing units. 

Low Cost. As was mentioned above, more than one robot will be active in the 
environment at once. In fact, the aim was to have 6-8 robots active at any one 
time. Allowing for charging times of the robots, about 10 robots should be needed 
in total. On top of this, a small number of other units such as the food cubes 
will be needed. The need for multiple robots forced the need for low cost. Each 
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robot should cost less than £300. For instance, this is, at the current exchange 
rate at the time of writing, is under one quarter of the cost of a Khepera robot 
(which would still need further development). 

4 Platform Overview 

The biggest decision in the design of the platform was to move control processing 
from the robots to a desktop computer. This centralised all the control process- 
ing, sensor data collection, housekeeping and other environmental control and 
reduced the cost. It also meant that more processing power could be added easily 
by either replacing the machine or distributing the processing across a cluster. 
However, this placed an additional load on the communications network as each 
robot would need to be serviced quickly enough to allow it to react to a dynamic 
environment. The overall topology of the platform is shown in Fig Q The host 
computer communicates via a packet switched radio network to each robot on a 
polled basis, requesting new sensor values and returning updated motor control 
values. 

Inter-robot communications and those between robots and the environment 
are handled locally by using IR communications. The result of these exchanges 
are relayed back to the host computer to enable it to work out what interactions 
took place between which robots / environment objects. 



4.1 Host System Control and Housekeeping 

The host computer takes care of almost all of the processing and housekeeping 
needs of the experiments and its tasks can be broken up into three distinct 
blocks: robot control, packet radio and environment control and robot interaction 
tracking. Fig ^illustrates the three layers. The actual communication is always 
over the radio network. However, this is hidden from the higher layers and each 
robot controller is completely unaware of the communications network or any of 
the other traffic over that network. 

Robot Control. The biggest job of the host computer is to run the control 
system for each of the robots. For each robot, the host computer needs to con- 
figure the control system according to specifications (such as the robots genetic 
makeup), receive sensor information from the robot, process that information 
and send back motor control commands to adjust the robot’s behaviour. Each 
robot needs to be serviced approximately every twenty milliseconds. 

Further to this, when two robots mate, the host computer needs to appropri- 
ately create an offspring candidate. Because there is a strict limit on the number 
of robots in the environment at any one time, the problem of over-populating 
needs to be dealt with. The solution in this case was to pool all the ‘poten- 
tial offspring’ together with a time tag. When a robot becomes free, one of the 
offspring is selected from the pool based on a distribution biased by the time 
stamp. This means that, in good times, many offspring will never actually be 
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Fig. 1. Overview of the platform communication structure showing overall Packet 
Switched Radio and local IR communications. 



initialised. The size of the offspring pool is therefore quite a good indication of 
how well the robots have adapted to the current environment. 

Environment Control. As well as controlling each of the robots, the host 
computer needs to know when a robot has interacted with another element in 
the environment. Passive interactions such as hitting a wall are not recorded but 
interactions with other robots and food cubes are. When a robot bumps into 
another active component (robot or food cube) the two components exchange 
unique identifiers via IR and these are both relayed back to the host computer. 
The host computer then needs to match the interactions with robots and food 
cubes and act accordingly. If two robots bump into each other, there is a possibil- 
ity of offspring being produced (based on definable requirements such as energy 
level of one or both of the robots) . If a robot bumps into a food cube, the robots 
energy level needs to be adjusted up or down (the level will go down if the food 
cube is actually a ‘poison’ the robots should be avoiding). At the same time, the 
food cube’s energy level may need adjusting which may effect the light intensity 
or colour the food cube emits. 

The host computer also monitors the robot’s battery level and can therefore 
judge when a robot will not be able to support an offspring without the batteries 
going dead. At this point, the robot is marked for recharging and is not used 
again until its batteries are back to full strength. 

Radio Network. At the core of all communications with the host computer 
is the Packet Switched Radio network. This network is based on a host /client 
relationship with the host polling each client in turn. This is very similar to USB 
or BlueTooth. On a physical level, the host computer communicates with a radio 
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Fig. 2. The layers of communication in the Packet Switched Radio network. The bot- 
tom layer is the actual communication layer. On top of this environmental and low 
level robot control is built and on top of that is the link between each robot and its 
controller. 



‘dongle’ which acts as the host transmitter for the radio network. This dongle 
(based around the Motorola HcgosEJs bit, 8MHz microcontroller) is connected 
to the host computer via USB. The host computer controls scheduling of the 
polling (which robots and food cubes are polled and how often) as well as taking 
care of re-transmission requests due to collisions and errors on the network. 

The radio network itself is based on Radio Packet Controllers (RPCs) by Ra- 
diometrix0 These modules are capable of 64kbps transmission over the license 
free 433MHz band. The modules benefit from having a low (3ms) collision detec- 
tion time before transmission and the ability to schedule packet transmissions. 
Each packet may be up to 27 bytes long. The dongle, food cubes and robots all 
have RPC modules that interface directly to their microcontrollers. 



4.2 Robot Systems 

The robots are based around a Motorola HC908 8 bit 8MHz microcontroller 
(like the dongle) . This microcontroller supplies many of the requirements for the 
robots including PWM (Pulse Width Modulation) motor control, A/D convert- 
ing and numerous I/O pins. The microcontroller is linked directly to the RPC 
and the motor channels (for PWM motor control and feedback) but all other 
systems are connected via a multiplexed address and data bus. Furthermore, the 

^ http:/ /e-www. motorola.com/webapp/sps/site/ 
prod_summary.jsp?code=68HC908GP32&nodeId=01M98634 
^ http://www.radiometrix.co.uk 
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Fig. 3. Robot architecture block diagram. 



data part of the multiplexed bus may also be switched between analogue and 
digital modes. Fig 0outlines the systems. 

Core Systems. The robots core systems consist of the power systems and 
monitoring, radio communications, motor control and motor speed feedback. 
They are available whatever sensors are used and whatever expansion modules 
are added. 

The power supply for the robot comes from high capacity NiMH (Nickel 
Metal Hydride) batteries. These batteries can keep the robot running for about 
3 hours on average and can be fast-charged in about 1 hour, 15 mins. They 
are rated at 3.6 volts (for the motors) with conversion circuitry to supply the 
5v regulated line for the electronics. Battery level monitoring and fast-charging 
circuits are on-board and, during charging, the microcontroller keeps an eye on 
the batteries, informing the host controller when charging is complete. 

The motor control is based on a simple motor driver. The PWM signal is 
generated internally by the microcontroller and direction control comes from a 
latch at the low end of the digital address space on the multiplexed address 
lines. Both motors are independently controllable and feedback comes from two 
reflective IR opto-encoders (one per wheel) . These give an indication of the speed 
of the wheels which the microcontroller uses to adjust the motor speed using 
BID (Proportional Integral Differential) control. This feedback also prevents the 
motors being damaged if they stall when the robot hits a wall or other obstacle. 
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Radio communications via the RPC are also directly linked to the microcon- 
troller. The microcontroller looks after configuration of the RPC module and 
filters out unwanted packets (every RPC receives every packet on the network). 
Checks for errors in the data are made and valid packets (uncorrupt and destined 
for that robot) are then processed. Packets may request sensor or status values, 
set motor speeds or reconfigure the RPC itself. 

Sensors. In the experiments the robots were originally designed for, each robot 
has 192 light sensors (64 each of red, green and blue). These are accessed through 
the multiplexed address and data lines by first requesting the address of the 
sensor and then retrieving the data over the analogue data lines. In fact, to cope 
with so many sensors, a group of three sensors (one red, one green, one blue) 
are addressed and the relevant analogue data line is then selected for encoding 
by the microcontroller. 

Each light sensor is a phototransistor with a colour filter. These light sensors 
could be replaced by any component that outputs a voltage change with changing 
conditions of between 0 and 5 volts. Further, although there are only 64 addresses 
sensors in the current design (in order to keep the design small and the cost 
down), 256 could be addressed in theory and each address may have up to 8 
analogue channels giving a total of 2048 sensors. 

Environmental Interaction. Environmental interaction is defined as bumping 
into something. As such, the robots have four bump sensors. When a bump sensor 
is depressed, the robot must have either bumped into a wall, another robot or a 
food cube (or another robot bumped into it, the order is not relevant). When this 
event occurs, each robot / food cube transmits their unique ID over IR but only 
listens for an incoming signal on the side(s) where a bump sensor was activated. 
The received id, if any, is transmitted back to the host computer so an action 
may be taken. The robot control system (running on the host computer) often 
has NO direct knowledge of bumping into anything, only an indirect knowledge 
of energy levels in the robot going up or down or (in some experiments) the 
registering of having mated. This is configurable for different experiments. 

Expansion. As covered briefly above, the microcontroller accesses most of the 
systems on the robot via a multiplexed data and address bus which is 8 bits wide. 
Furthermore, there is an analogue/digital select line and a read/write line. In 
the current design, only the lowest 8 bytes of the digital address space are used 
(motor direction, IR transmit and receive [times 4], battery monitoring, bump 
sensors and misc. indicators) and only the low 64 addresses of the analogue space 
are used. The rest of the digital space is available for expansion for memory, 
memory mapped devices or sensors and outputs. The rest of the analogue space 
may be used for analogue sensors (there is no analogue output capability) as 
described above. 

4.3 Food Cubes 

The Food Cubes are based on the same design as the robots although the ana- 
logue data bus, motor control systems and sensor capabilities are not present. 
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Instead, the Food Cubes can emit combinations of three frequencies of light (red, 
green and blue). 

Core Systems. As with the robots, the food cubes have packet radio capabil- 
ities and battery monitoring facilities. However, they lack motor control, ana- 
logue sensor and expansion bus capabilities. The digital data bus is still present 
to allow the addition of further modules. 

Environmental Interaction. The food cubes have the same system of IR 
exchange as the robots do (and their own unique IDs) and they respond in 
exactly the same way as the robots when something bumps into them. Like the 
robots, the food cubes have controlling processes running on the host computer 
(although these are hand-coded) which allow them to react to environmental 
factors (such as being bumped into by robots). 

Unlike the robots, the food cubes are capable of emitting red, green and blue 
frequencies of light in differing quantities. Light level is based on PWM style 
dimming and so may not be usable with some light sensors because they rely 
on ‘persistence of vision’ effects (the maximum reaction time of the light sen- 
sor being a lot slower than the frequency the emitter is working at) . Due to the 
A/D conversion time on the HC908 processor, this will actually be the determin- 
ing reaction time in most cases, not the sensor itself. All three frequencies are 
completely independently controllable allowing for mixtures of colours. Green 
may represent the level of food available, red the level of poison - a balance the 
robots would have to learn about. Because the mating criteria for the robots is 
totally controllable, separate food and mating energy levels could be maintained 
with green being food and blue being ‘mating energy supply’. The brightness of 
the food cubes is deliberately not very high so that, under normal light levels, 
robots should have some difficulty spotting food sources from a distance (forcing 
a search requirement in their repertoire in order to survive). 

5 Limitations 

Unlike simulations, working in the real world does place a lot of restrictions 
on what you may do in an experiment and how you set the experiment up. 
Hardware choices have real consequences and two of the biggest restrictions on 
experimental setup are covered here. 

5.1 Interaction Tracking vs Continuous Tracking 

In a simulation, every movement of every entity is known precisely and may be 
tracked and recorded for later inspection. This continuous tracking of each entity 
has many advantages when analysing behavour. In the real world, the precise 
location, direction and speed of each entity is not so easily measured. The choice 
was made with this platform to forgo the continuous tracking capabilities due 
to cost. Instead, only the sensor values and motor commands are loggable, as 
are interactions with other robots, obstacles and food cubes. This logging still 
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provides a high degree of information but, unlike with continuous tracking, the 
information is more loosely time-coupled (with only interaction time-stamps 
being present). 

A solution to this may be filming the robots using a digital camera and adding 
the sensor, command and interaction log information to the video track timeline. 
This has been considered and it should be possible using an object-based video 
format (such as Quicktime or Mpeg4) and a high speed video transfer method. 
However, the equipment would be costly and no huge benefit is perceived cur- 
rently. 

5.2 Radio Network Limits 

In choosing off-board control for the robots and the Radiometrix RPC modules 
for communication, limitations were placed on the expandability of the platform 
to accommodate more entities. With the relatively low bandwidth available, it is 
estimated that a maximum of 10 robots and 5 other objects such as food cubes 
would be able to communicate with the host computer before the bandwidth is 
saturated. 

Alternatives to the Radiometrix RPC modules are available in the form of 
BlueTooth or IEEE 802.11b wireless Ethernet. However, BlueTooth is only capa- 
ble of hosting 7 active clients per network so multiple networks would be needed. 
802.11b and BlueTooth would also both have required much longer development 
times and higher cost. It is hoped that the robot design is expandable enough 
to cope with such modules if they are considered useful in future research. 

6 In Summary 

This paper presents an overview of a new hardware platform that will allow 
research in hardware evolution not possible with current platforms. It is cost 
effective and extensible to allow for future research needs and is not limited 
to evolutionary hardware experiments but rather any research requiring one or 
more entities working within the same environment. Although there are some 
limitations to the current platform and its use in certain experimental environ- 
ments, these could potentially be overcome with additional technology. 
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Abstract. This paper discusses some aspects regarding the use of uni- 
versal linear threshold elements implemented in a standard double-poly 
CMOS technology, which might be used for neural networks as well as 
plain, or mixed-signal, analog and digital circuits. The 2-transistor ele- 
ments can have their threshold adjusted in real time, and thus the basic 
Boolean function, by changing the voltage on one or more of the in- 
puts. The proposed elements allow for significant reduction in transistor 
count and number of interconnections. This in combination with a power 
supply voltage in the range of less than 100 mV up to typically 1.0 V 
allow for Power-Delay-product improvements typically in the range of 
hundreds to thousands of times compared to standard implementations 
in a 0.6 micron CMOS technology. This makes the circuits more sim- 
ilar to biological neurons than most existing CMOS implementations. 
Circuit examples are explored by theory, SPICE simulations and chip 
measurements. A way of exploiting inherit fault tolerance is briefly men- 
tioned. Potential improvements on operational speed and chip area of 
linear threshold elements used for perceptual tasks are shown. 



1 Introduction 

Floating-gate CMOS circuits have found their use outside the digital memory 
use during the last decade, and an introduction to the field can be found in JQ. 
Floating-gate devices are used as analog memory elements, as part of capacitive- 
based circuits and as adaptive circuit elements Non-traditional floating-gate 
circuitry is also starting to find it’s way into commercial use [0. The circuits 
presented here are offspring from [Q, which might be perceived as having their 
startingpoints from two basic ideas, namely the multiple-input floating-gate tran- 
sistor concept and UV-programmable floating-gate circuits 0, using the UV- 
programming method from [0. 

By exploiting the inherent analog amplifying characteristics of the transistors 
for something more than the traditional switching function similar to the method 
in one can reduce the number of active elements and interconnect dramatically 
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[0. Combined with operation of the transistors in weak inversion, where the 
currents are typically down in the sub pA to uA range [0, from experience 
with a standard 0.6 um CMOS process |0, there is a significant ultra low-power 
potential whether the building blocks are potentially used for analog, digital or 
mixed-signal circuits or neural network implementations. 

Section ^introduces one of several 0 real-time reconfigurable linear thresh- 
old elements and suggests ways it can be used for implementing a variety of 
basic Boolean functions. The ultra low power consumption potential and im- 
proved manufacturability are then briefly treated. In section ^strategies possibly 
increasing fault tolerance is given. 

Section Egives some pointers towards the potential chip area reduction, and 
thus production costs reduction, that might be obtained by using linear threshold 
elements. For some perceptual data processing the speedup of three orders of 
magnitude compared to a traditional computer implementation is given as an 
example. 



2 CMOS Floating-Gate Circuits in Weak Inversion 

2.1 Balancing the Circuit; Setting Switching Point 
and Current Levels 

Our circuit elements need to have their switching point adjusted by an initial 
UV-programming p], which is a post-fabrication technique that can be applied 
to any standard double-poly CMOS technology. The setting of the switching 
point ensures that the output voltage for the circuit element is at Vdd/2 for an 
input at exactly the same voltage level. This is done to achieve symmetric noise 
margins for digital zeros and ones and the best possible digital operation. 

During UV-programming the chip is radiated by UV-C from a lamp. Con- 
currently the power supply rails and substrates get certain voltages applied that 
regulate charge transport to the floating gates of the device. Once the UV- 
exposure is ended, the charge levels on the floating-gates are defined practically 
indefinitely. Charge transport is done through the UV-activated conductances, 
which are indicated by extra circles between gate and sources in the transistor 
symbols in Figure D 

One and the same UV-programmable elements might be reprogrammed for 
different current levels, due to for example different needs for operational speed 
of the basic circuit building blocks [Q. Switching current levels, Ibeqi might be 
programmed to 2-3 orders of magnitude |0 due to experience with the process 
from [0. 



2.2 Simple Analysis of a Basic Linear Threshold Element 

The following equations can be used to describe the weak inversion currents of 
the PMOS and NMOS transistors [ II 1 )J : 
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Here, ki = Ci /Ctot is the capacitive division factor of the ith input capacitor, 
Ci, and Ctot is the total capacitance seen from the floating-gate. Iheq is the 
balanced equilibrium current, which is the drain current of the transistors when 
all ordinary input signals and driven nodes are equal to Vdd/2. /f,eg is strongly 
dependent on the Vdd level. Here an equal number of capacitively weighted inputs 
to both the PMOS and NMOS transistor is assumed, and that the intrinsic slope 
factors in weak inversion, n, are equal for both devices. Ut is the thermal voltage, 
which is 25.8 mV at room temperature. 




OUT 



Fig. 1. A CARRY’-, NAND-, NOR and INVERT-circuit [E]. The numbers 20.8 and 1.2 
refer to the width and length of the transistors in micrometers, respectively. 



To illustrate some traits of the reconfigurable floating-gate circuits, the 
“P1N3” circuit in Figure [Qis used here. The capacitances between X,Y and Z are 
all designed for equal size. Also the sum of capacitances coupled to the NMOS 
(Figure D equals the capacitances connected to the PMOS. (Cx 2 + Cy 2 + Cz 2 = 
Cyi). The numbers of capacitively weighted inputs have been used to name the 
circuits, counting ordinary inputs and other inputs used for control of behavior 
to each two-MOSFET element. The circuit in Figure Qgets the name P1N3 due 
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to this naming system. The most used inverter may be called PlNl. The cir- 
cuit has previously been presented as a stand-alone circuit or building block |0. 
Using the above equations yields: 

Ids,p = Ibeq^xp (3) 

Ids,n = heqexp I — (^-14 + 3 ^!/ + 3 ^- - ^) } (^) 

Vp, for example, means the voltage on the capacitively weighted input to the 
PMOS transistor in Figure D If the voltages Vp,Vx,Vy or Vz are all equal to I4d/2 
the exponentials in the above equations equal 0, and we have the equilibrium 
condition with Ids,p=Ids,n=heq- To illustrate how the circuit functions logically, 
a truth table is used here. As parts of the truth table, Cp and e„ are used; these 
are the parts of the exponentials directly dependent on input signals and the 
supply voltage, Vdd- For this particular circuit that means 

ep = (^ - (5) 

= + + ( 6 ) 

When parasitic capacitances are not accounted for, each of the inputs at 
nodes X,Y and Z are weighted by 1/3, a more optimistic estimate than would 
have resulted from including parasitics [0. When Vp = Vddl‘2- and X, Y and Z 
are allowed to have the binary values according to the table in Figure 0 Figure Q 
results. When Cp > e„ the output approaches the Vdd level, since the PMOS is 
then “strongest” . In the opposite case it goes low. 



IVdd 


1 


|HIGH 1 




|0 


|LOW 1 



Fig. 2. Meanings of the 3 columns in each row are directly related. 



By inspecting the truth table (Figure E) it is clear that the value of the 
output depends on the number of I’s and O’s on the inputs only. Inspecting the 
truth table (Figure 0, just counting I’s and O’s in the input vector, makes it 
possible to get enough information out from a table with a simpler form, as in 
Figure 0 

Used this way (Figure 0 Figure 0 the circuit computes the inverted carry 
for a FULL- ADDER: 

OUT = CARRY' = {XY + XZ + YZ)' (7) 

From the truth table, in Figure 0 one can see that by letting any one input be 
“0”, the output is “0” if, and only if, both other inputs are “1”. Then the circuit 
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Fig. 3. The table shows parts of the exponentials, 6p, Bn, and output values, for all 
possible binary values of inputs X,Y,Z when Vp=Vddl‘2- “OUT” provides the CARRY’ 
function for a FULL- ADDER. 
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Fig. 4. The table shows parts of the exponentials, 6p, e„, and output values, for Vp = 
Vddl‘2 and different numbers of “Us” on ordinary inputs X,Y,Z. Inputs can be either 



“0” or “1”. 



implements the 2-input NAND function. If any one input is 1, the output is “1” 
if and only if both other inputs are “0” . Then the circuit works as a 2-input 
NOR gate. Connecting one input to Vdd or Vss and short-circuiting the other 
two gives an INVERTER. A 2-input inverting-structure like NAND or NOR is 
essentially the only function needed to implement any digital function. 

If the Cp value is changed, the “threshold” for when, and if, e„ > changes. 
From the above equations it is easy to see that setting Cp = —2Vddl^ makes 
it necessary to have only 1 binary input at “1” {Vdd) to make e„ = —Vddl^-, 
which is greater than Cp, and should give a low output. Changing the Cp value 
to 2Vdd/6 makes it necessary to have all inputs X,Y,Z high, when restricted to 
Boolean inputs, in order to produce a low output. When perceiving the digital 
functionality of the linear threshold element only, it is possible to make a table 
like in Figure 0 



Cp 


digital functionality 


-2Vdd/G 


NOR3, NOR2, INVERT 


0 


CARRY’, NOR2, NAND2, INVERT 


2Vdd/G 


NAND3, NAND2, INVERT 



Fig. 5. The table shows part of the exponential, 6p, and the Boolean functions resulting. 
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Fig. 6. Actual measurements for the P1N3 circuit, from a prototype chip, when it is 
producing the CARRY’ of a FULL-ADDER 0 



Changing the threshold by adjusting the value of the input capacitively cou- 
pled to the PMOS can provide a real-time reconfigurable digital circuit. 

2.3 Measured Functionality of P1N3 Linear Threshold Element 

The inherent functionality for Cp = 0, demonstrated by measurements, can be 
seen in Figure 0 The voltage on the output goes low if, and only if, 2 or 3 
of the inputs X,Y,Z goes high at the same time. Otherwise the output stays 
high, which in this case is Vm = 0.8 V. The three curves (Figure E] correpond 
to transitions between the four rows in Figure 4. The continous line shows the 
output voltage when the number of high inputs moves from 1 to 2, which is the 
transition between 1 and 0 on the output, for the CARRY’ function of binary 
addition. 

The functionality and performance of such circuits will depend on the imple- 
mentation technology of choice. The number of inputs, their capacitive weights, 
if an input is coupled both to the PMOS and NMOS are among other factors that 
can be varied [EJ- To increase the voltage gain, the size of the drawn capacitances 
between inputs and floating-gates should be increased [0. 
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Year 



Fig. 7. If nothing is done about power consnmption, power will get out of hand and 
Moore’s law cannot continue M- 



2.4 Ultra Low Power Consumption 

Dynamic power consumption depends linearly on the physical capacitance be- 
ing switched m Therefore using less interconnect and fewer active elements 
for a given function may be attractive for minimizing power consumption. Volt- 
age reduction offers the most direct and dramatic means om minimizing energy 
consumption m- Chip measurements have demonstrated floating-gate circuitry 
working for supply voltages down to 93 mV m. while typical supply voltage lev- 
els have been in the 300 - 800 mV range |Q. SPICE simulations using the “P5N5” 
linear threshold elements as building blocks in an 8-transistor FULL- ADDER 
operating at a Vdd of 200 mV compared to a standard cell implementation in 
the AMS 0.6 CMOS technology showed that the floating-gate implementa- 
tion used 2500 times less energy, a Power-delay-Product (PDP) of 2.3 fJ while 
running at a clock frequency of 1 MHz [0. Power consumption is an increasingly 
important issue, which is illustrated in Figure Obased on a slide from the pre- 
sentation of [EJ- The PDP numbers could maybe not be undercut by any known 
CMOS technology of the same generation. 

2.5 Matching and Manufacturability 

This technology has some very attractive features, but an important problem 
that remains is that the UV-programmable circuits have not been demonstrated 
for circuits containg more than between 10 and 30 transistors transistors yet, at 
least from the open literature. Hopefully this number can be increased signifi- 
cantly, due to a proposed design method in 0, using as few different building 
blocks as possible. Since “The Achilles Heel of analog is that every transistor 
is different” [IE] matching m of both transistors and passive elements should 
become the best possible if a larger system could be built from identical build- 
ing blocks instead of dedicated circuitry for each basic function. Good matching 
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Fig. 8. Fanlt tolerance improvement scheme. 



means a good relative accuracy between identically constructed circuit elements. 
Certain chip layout techniques m improve matching, as well as increasing the 
size of the drawn circuitry. 

3 Increasing Faul Tolerance for Floating-Gate Linear 
Threshold Elements in Weak Inversion 

One way to increase fault tolerance is to introduce redundancy on the gate 
level by letting, for an example, three identical gates (or more) with similar 
inputs drive the same output, as is illustrated in Figure El Then if one out of 
the arbitrarely number of redundant gates fails to take part in producing the 
correct output, the others could still maintain proper functionality. The steep 
dependency on output current and output resistance of the transistors from the 
gate voltage, in weak inversion compared to the classical area of operation, might 
make the principle useful. Letting more than one FGUVMOS circuit element 
share a common output node has been implemented earlier gz], p], but for 
different purposes. 
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4 Using Linear Threshold Elements 

for Speeding up Perceptual Processing 

The circuit elements herein are linear threshold elements, which are basic pro- 
cessing units in certain neural networks A classic model of a neuron is a 
linear threshold device, which computes a linear combination of the inputs, com- 
pares the value with a threshold, and outputs -1-1 or (-1) if the value is larger 
than the threshold [ IT7| . 

Real neurons are found in the biological nervous systems. Human brains are 
by far superior to computers in solving hard problems such as combinatorial 
optimization and image and speech processing, although their basic building 
blocks are several orders of magnitude slower, which have boosted interest in 
the field of artificial neural networks 1^, [^. 

While neural networks have found wide application in many areas, the limi- 
tations and behavior of such networks are far from being understood ED]. 

Despite the power of digital computers they are not clever enough in the 
sense of perceptual and “intelligent” processing like seeing an object in the vi- 
sual field, recognizing what it is, and taking proper action in real time. For 
biological systems, including humans, in general those are generally effortless 
tasks m- Such tasks are extremely difficult even for state-of-the-art computers. 
According to m the performance gap could never be narrowed by just increas- 
ing the clock frequencies of MPU’s, integration densities of memories and further 
sophistication of software programs. 

What is sometimes denoted “Intelligent data processing” tasks of today are 
presented as software programs which is reduced to a series of simple binary 
operations executed on MPU chips. However, in our brains the algorithms and 
the hardware are inseparably merged in the system m To carry out the intel- 
ligent data processing directly in hardware, “neuron MOS transistors” are ex- 
ploited in m These transistors are multiple-input floating-gate devices where 
the floating-gate potential is determined as a weighted sum of multiple input 
signals via capacitance coupling, and thereby controls the current in the chan- 
nel. The neuron MOS transistors are so named due to their similarity to the 
McCulloch-Pitts model of a neuron m- An association processor architecture 
utilizing these principles has been verified to about three orders of magnitude 
higher performance as compared to typical SISC processors of the time ED- 

To which extent the linear threshold elements briefly described in this paper 
could switch places with the neuron MOS transistors in m could be researched 
further. 



5 Using Linear Threshold Elements 
for Decreasing Chip Area 

Assuming that each threshold gate can be built at a cost comparable to that of 
traditional AND, OR, NOT, termed AON logic, neural networks can be much 
more powerful than traditional logic circuits m 
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Fig. 9. Number of gates necessary to implement certain functions, as a function of 
number of bits, n. Number of gates using traditional AND-OR-NOT logic increases 
exponentially, while the increase can be linear when linear threshold elements are used. 



In CMOS the production costs of a chip has a strong dependency on the chip 
area. Since the area of basic linear threshold elements, like P5N5 0 and P1N3, 
are in the same order as normal gates, the costs might be comparable on the 
gate level. 

For some functions, like XOR, the number of elements in a traditional AON 
circuit will grow exponentially with the number of bits in the input, while when 
implemented using linear threshold elements the number of gates are linear in 
the number of input bits m- Generally, a depth-2, AON circuit computing XOR 
of n bits requires at least 2"“^ + 1 gates. A Linear Threshold circuit needs only 
n-hl gates. Figure Qillustrates these relationships for n=l, • • • ,7. 

Another example of potential use of threshold logic m is: Whereas any logic 
circuit of polynomial size (in n) that computes the product of two n-bit numbers 
requires unbounded delay, such computations can be done in a neural network 
with “constant” delay. The product of two n-bit numbers and sorting of n n-bit 
numbers can be computed by a polynomial-size neural network using only 4 and 
5 unit delays, respectively. Unit delay is equal to a “depth” of one for an artificial 
neural network ED]. 

Symmetric Boolean functions depend only on the sum of input values, and 
since the parity function is symmetric it can be computed in two layers of a 
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neural network whereas it takes unbounded delay to compute parity in a logic 
circuit [ rr| . 

For many years the topic of linear threshold logic has been approached in 
two different ways: theory on computational circuit complexity on one hand, and 
hardware implementation on the other. There has been very little interaction 
between the two approaches, as was stated in m- 

6 Conclusion 

One type of real-time reconfigurable linear threshold element implemented in 
floating-gate UV-programmable CMOS technology has been presented, includ- 
ing measurements from a prototype CMOS chip demonstrating functionality. 
Power-Delay-Product numbers are among the lowest reported, and there are an 
interesting potential for implementing certain functions using less area then by 
traditional logic. 

To determine the true computational power, due to number of inputs and 
capacitive weights for an example, further research should be done. 

A simple proposal for how to make fault-tolerant circuitry using our linear 
threshold elements has been proposed. 

An example on how similar CMOS circuitry could be used in speeding up a 
certain perceptual signal processing task about 3 orders of magnitude was briefly 
mentioned. 

Due to the above mentioned aspects the true potential of this immature 
technology should be explored further. 
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Abstract. A McCulloch and Pitts neuron implemented in nano-tech- 
nology is simulated. The neuron is created with SET circuits operating 
in the single-electron current regime. Therefore the extremely low power 
properties of the SET devices are fully exploited. This neuron is simu- 
lated with a SPICE description model for a single SET junction. 

Keywords: SPICE, Single-electron tunneling (SET), Random Back- 
ground Charge (RBC) fluctuations, McCulloch and Pitts Neuron, Low 
Power. 



1 Introduction 

A very promising device in nano-technology is the single-electron tunneling 
(SET) junction. This device consists of two conductors, with a very thin insu- 
lator in between. For example aluminum, aluminum-oxide and aluminum. The 
distance between the two conductors is in the order of a few nm. It is possible for 
an electron to tunnel through the insulator, but only when the absolute voltage 
difference across the junction will be decreased due to the event [Q. 

The two major problems this device is facing are the random background 
charge (RBC) fluctuations and the low operating temperature. The last problem 
is nearly solved. Some SET circuits can already operate at room-temperature 

The RBC fluctuation problem is not solved at the device-level yet, but it 
is possible to design useful circuits with these junctions using redundancy to 
cope for this type of error |Q. A neural network is a possible way to generate 
redundancy; this is described in this paper. The combination of neural networks 
and SET technology has advantages in two ways. First, a powerful neural network 
requires a large amount of neural nodes. This implies each neuron has to be small 
in dimensions due to restricted available space. Also, the power dissipation of 
each neural node has to be extremely low in order to have an acceptable overall 
power-dissipation. In these aspects, SET devices are very suitable to use in neural 
networks, because they are extremely small and can they can process information 
using only a few electrons. Secondly, SET devices introduce errors like RBC 
fluctuations and show stochastic behavior. A neural network can compensate for 
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these small errors and is therefore a suitable architecture for SET devices. In 
it is stated that when SET circuits operate in a good shielded environment, the 
RBC fluctuations only appear a few times a day, this error is small enough to 
compensate for in the neural network architecture. The neural network described 
in this paper could also be made in CMOS technology, but the power-dissipation 
would be orders of magnitude higher, even at the lowest possible current. Also 
the required physical space would have been increased with respect to the SET 
variant. An example of an McCulloch and Pitts neuron implemented in SET 
technology is discussed and simulated with a SPICE description model for a 
single SET junction E- With the SPICE model, it is possible to perform a 
transient analysis for any arbitrary SET circuit operating in the single-electron 
current regime (SER). From these transient simulations a static/dynamic plot 
can be extracted. This plot can be a convenient tool for designing SET circuits. 
In section 2, the SPICE model is discussed in brief. In section 0 the three 
island SET structure (SIS) is discussed and simulated using SPICE. In section 
0 the SIS is used as a building block to create a McCulloch and Pitts neuron. 
A small neural network to perform optical character recognition is discussed as 
an example. 

2 The SPICE Model in Brief 

The SPICE model is based on the impulse model [0 for the single SET junction, 
shown in Fig. 0 The model consists of a true capacitor with a current source 
placed parallel. When the junction voltage Ucj exceeds the critical voltage 
the current source is triggered and effectively a charge of exactly the elementary 
charge e is transferred between nodes (1) and (2), by means of the delta function. 
Mathematically: it = e6{t — to). Note that the critical voltage is defined as: the 
maximum allowed voltage across the junction just to remain in the Coulomb 
blockade |0 region. 



(1)c 



TJ 



(2)c^ 



(1)0 






Ci 






(2)0 





0| i, = e6(t-g 



(a) 



(b) 



Fig. 1. (a) SET junction network symbol, (b) impnlse model of the SET junction. 



The SPICE model is a direct implementation of the impulse model, shown 
in Fig. 0 However, in the SPICE user environment the delta pulse function is 
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not available. The function to transfer the charge e across the junction is chosen 
to be a rectangular shaped pulse, for which e = if dt holds. This rectangular 
shaped pulse is implemented with a one-shot function. This one-shot function 
is created with programmable SPICE building blocks. All other components of 
the SPICE model can be found in the standard SPICE libraries. 




Fig. 2. SPICE implementation of the impulse model. 



The values of the capacitors and the tunnel-time are scaled in order to stay 
within the numerical range of SPICE. The voltage levels are not scaled. The 
pulse width dt is chosen to be 1 ms (in the scaled model). This means with this 
model it is possible to simulate currents up to 1000 electrons per second, which 
is equivalent to 160/A. This current region, from Coulomb blockade region up 
to 160/A is defined as the single-electron current regime (SER). When SET 
circuits are simulated outside this region, the current will be clipped to 160/A. 

With the SPICE model it is also possible to generate a static/dynamic plot for 
any given SET circuit. To generate a static/dynamic plot, two external sources 
are sweeped (or stepped) and one or more outputs of interest are plotted versus 
the sweeped external sources. The outputs can be either voltage levels or tunnel 
event signals taken directly from the one-shot function in the model, or combi- 
nations of those signals. The plotted output signals can be either stochastic or 
deterministic. This static/dynamic plot can be a convenient tool for developing 
SET circuits. For example, the designer can get direct insight in values to chose 
for biasing sources for the particular purpose. 

The RBC fluctuations are modelled using SPICE by applying a certain charge 
(fluctuation) to specific circuit nodes. 

In section 0 the three island structure, also known as the SET-inverter, is 
simulated using the SPICE model described. Both a transient simulation and a 
static/dynamic plot are generated. 

3 The Three Island SET Structure 

The SIS was first introduced by Tucker |0. The main idea behind this structure 
was to copy the working principle of a CMOS inverter onto SET technology, 
using 2 SET transistors. The SIS consists of four tunnel junctions and three 
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islands (a), (b) and (c). The SIS operates completely in the voltage domain. An 
advantage of working in the voltage domain only, is the lack of needing high 
ohmic resistors e.g. for biasing. Another advantage is there is no need for signal 
transformation to couple a SIS to one another. The SIS is shown in Fig. 0 




Fig. 3. The three island SET structure, with Cgp = Cg,2 = 8 aF, Cs,i = Cs ,2 ~ 7 aF, 
capacitances of TJi ..T J 4 are (7g,i .. Cg ,4 = 1 aF and Ci = 24 aF for digital operation 
and 2.4 fF for analog operation, Ug = 6 mV. 



The SIS is able to operate in four different modes, namely: 

— A analog stochastic mode 

— A digital stochastic mode 

— A analog deterministic mode 

— A digital deterministic mode 

The modes can be directly read from the static/dynamic plot, shown in 
Fig.i] This plot indicates in which of the four modes the structure operates 
with respect to the applied external voltage sources Ug and Ug. To obtain this 
plot, Ug was sweeped from 0 to 40 mV, and Ug was stepped from 0 to 10 mV 
in 100 steps. The output quantity plotted, is a combination of the normalized 
output voltage level Uout and the tunnel event signals taken from the one-shots. 
In this way, the output voltage and the stochastic regions are clearly visible. 
With this plot, the designer is able to get direct insight in the values to choose 
for the sources and capacitance values, in order to set the SIS in the proper 
operation region. 

Both the analog versus the digital mode and the stochastic versus the de- 
terministic mode depend on the values of the external supply voltages and the 
values of the capacitances. The stochastic mode is the mode where free tunneling 
will occur. To exploit the extreme low power properties of the SIS, the stochastic 
mode should be avoided. 
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Fig. 4. Static/Dynamic plot of the SIS. Plotted is a combination of Uout (normalized) 
and the tunnel event signals versus Us and Ug. The dark area equals output logic level 
‘O’, the light area equals output logic level ‘1’ and the speckled area is the stochastic 
region. 



The set of capacitance values are chosen as follows: Cj^i .. Cj ^4 are chosen 
1 aF, as a starting point. Cgp and Cg ^2 should be an order of 10 times bigger 
than Cj^i to let Ug have more effect on island voltages a and c. Cs,i and Cg ^2 
are control nodes for the offset voltage of islands (a) and (c), and should be as 
effective as the gate capacitances, so they should be chosen of the same order of 
magnitude. For the sake of convenience, we took the values of |0, see Fig. 0 

For the actual design of the neural circuitry, the value of Us is chosen 6 mV, 
to stay out of the stochastic region and to have inverter operation at a low gate 
voltage Ug (around 4 mV), see Fig. 0 For digital operation, Ci is chosen to 
be 24 aF, because only one electron can effectively be held with 6 mV supply 
voltage. For analog operation Ci should be chosen as large as possible in order 
to have many values between Uouti„„ and Uout^igh ■ ^ig. Q a transient SPICE 
simulation is shown for the SIS, with Cioad = 240 aF. 

It can be seen from Fig. Qthat amplification in the voltage domain is possible; 
when for example the input voltage is varied between 4 and 5 mV, the output 
voltage changes approximately 6.5 mV. The reason for choosing the SIS for 
the signal-processing instead of simpler SET circuits is because of the buffer- 
function, which is needed for a high coupling factor, which on its turn is highly 
preferable in a neural network. The SIS was used to create a McCulloch and 
Pitts neuron, this neuron is discussed in section 0 



382 



Rudie van de Haar and Jaap Hoekstra 




Ug [mV] 

Fig. 5. Plotted is Uout versus Ug for the SIS in the deterministic mode {Us = 6 mV), 
with Cl = 240 aF, which means that at most 10 electrons can be held on Ci. 




Fig. 6. The basic McCulloch and Pitts neuron. 



4 The McCulloch and Pitts Neuron 

The basic McCulloch and Pitts neuron is shown in Fig. Qlt requires binary input 
signals for the input vector X, and analog input signals for the weight vector W. 
The input vector X is multiplied by the weight vector W. These signals are 
summed and fed through a hard-limiter. (In brief: the inner product X ■ W is 
thresholded) . The output signal Y is on its turn also binary. The threshold-level 
for the hard-limiter can be set by means of signal Sq. The signal- type of Sq is 
analog. It is also possible to treat the signal Sq as an input signal like X^, which 
is convenient to use in the SET circuits. 

For each neuron the following requirements must be met: 

— The neuron must be capable of driving many other neurons, 
this means: a buffer-function is required 

— The overall signal amplification must be greater than, or equal to 1 
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Fig. 7 . McCulloch and Pitts neuron in implemented SET technology. 



As discussed in the last section, the SIS is capable of amplifying the signal in 
the voltage domain. Therefore, the voltage levels in vector W have an offset value 
of 8 mV, which will be divided across Cxi and C^i, therefore the working area 
of the SIS is shifted to Ug > imV, see Fig. Q This structure has also a buffering 
function and is therefore capable of driving many other SIS, which is needed for a 
powerful neural network. In Fig. 0a McCulloch and Pitts neuron implemented 
in SET technology is shown. In this figure, the coupling capacitors are given 
by Cg^x- The value of these capacitors were chosen as follows: In order to have 
maximum coupling from S/^o .. 3ISn+i to 3ISn+2, Cco ■■ Ccn+i must be chosen 
as large as possible (oo). On the other hand, to have minimal mutual influence of 
S/S'o .. 3ISn+i onto each other, Cco ■■ Ccn+i must be a chosen as low as possible 
(0). The best tradeoff values were found to let Cco .. Ccn+i be smaller than Ci 
of 31 So .. 3ISn+i, and larger than Cgy of 3ISn+2, so 100 aF is taken. With the 
same arguments Cxo ■■ Cxn+i and Cwo ■■ Cmn+i were also chosen to be 100 aF. 
With these values, a 4 input neuron is successfully simulated. Other examples 
of SET neurons can be found in literature, e.g. [ Pirn] . but they operate in the 
stochastic mode, and therefore consume more power than the neuron presented 
in this paper. 

As mentioned in the introduction, SET devices are capable of operating with 
faulty elements. The following example is used to calculate the effect at the 
output of a disturbance due to a random background charge fluctuation in one 
of the SET devices. 
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Fig. 8. Neural network for OCR to detect digit ‘2’. 



In Fig. 0 a neural network for optical character recognition (OCR) is shown. 
With this neural network it is possible to check if the input equals digit ‘2’ or 
not. The input consists of a binary signal X, comprised of 25 bits. This is a feed- 
forward network, which consists of two layers. The first layer (iVo^i) contains 
25 neurons and the second layer (Ni^i) contains 1 neuron. Each neuron has 25 
inputs and 1 output. All nodes in the first layer are placed in parallel, this means 
they all have the same weight vector W. The node in the second layer performs 
the function of majority voting element. 

Assuming that the neural network is trained, e.g. using the back-propagation 
learning technique m, the maximal allowable error, in case of majority voting 
will be: 

£(>-) = + 2 ), n = 24 (1) 

Where (n-|-l) equals the number of SIS in the first layer. As can be seen from 
equation 0 the errors in the second layer are unacceptable, because when such 
a node is in error, the output is directly in error. Therefore the nodes in the last 
layer should not be made in SET technology. Note that this is true for a general 
neural network made with the nodes simulated in this paper. For the nodes in 
the other layers, the worst case scenario means 12 out of 25 nodes may be faulty 
in order to let the network still work properly. 

The error per neural node, with respect to the SIS building blocks, (Fig. 0, 
is given by equation 0 

e(Y) = e(3IS)n+i + e(S /^)„+2 + (2) 

n 

This means when an error arises at the threshold 3IS(n-|-2) or the offset 
SlS(n-l-l), the whole node will be faulty. The other SIS are operating in the ma- 
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jority voting domain, which means, for an 25 input node, the maximum number 
of faulty elements for good operation is at maximum 12. 

From equations ^ Etnd 0 we can conclude when the last layer neurons are 
not made in SET technology, that in the worst case scenario, when the errors 
appear at the SIS in positions (n+2) and (n+1) in the first layer (iVoy), than the 
maximum number of allowable faulty elements (SIS) equals 12. From the above 
discussion we can conclude that when this neuron is placed in a good shielded 
environment, this means the RBC fluctuations will only appear in the order of 
a few times per day 0] in the worst case, than the circuit can at least operate 
for 6 days without any error. 

5 Conclusions 

A McCulloch and Pitts neuron in SET technology is simulated using SPICE. 
This neuron is based on the SIS operating in the SER, therefore the extremely 
low power properties of SET circuits are fully exploited. The SIS was simulated 
using a SPICE model for a single SET junction. An example of an OCR network, 
using the neurons proposed is simulated and discussed. Also a small error analysis 
for this neuron is given. 
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Abstract. Accurate models are needed for understanding the biological 
neuron in its totality. The need for these accurate models is in a evolvable 
system like a Brainy Processor, understanding unexplored functionality 
of the brain and fault simulation of the different cortex regions. The accu- 
racy of the model increases computational complexity. Existing software 
simulators would be extremely slow in terms of performance for large 
number of neurons. General-purpose processors would have architecture 
unsuitable to tackle the heavy floating point computation involved in 
modeling more accurate neuronal assembly models. To overcome this 
problem, quite a few ASIC chips to simulate simple neural models like 
(Leaky integrate and fire. Spiking neuron, Hodgkin and Huxley) have 
been developed. The ASIC architectures proposed are only for simple 
neuron models where several characteristics are disregarded to lessen 
hardware complexity. Suggesting ASIC chips for every type of neuronal 
assembly would be cost prohibitive. To deal with hardware complexity 
and different classes of neuronal assembly, a programmable hardware 
unit or in a wider sense an instruction driven general-purpose processor 
architecture is proposed. The functional units of this architecture are 
tailored to evaluate complex stochastic neuronal assembly models. 



1 Introduction 

Neuron models find extensive application in evolvable systems. Further these 
models can find application in developing a brainy processor m- Understanding 
the unexplored functionality of the brain and fault simulation of the different 
cortexes of the brain could be other major applications. These demand accurate 
and almost complete neuronal assembly models involving wide characteristics. 

Several software-based simulators are available for modeling the neuronal ac- 
tivities in an assembly, notable among these is jSj. To overcome the problems of 
time complexity, these software simulators adopt simple models for a neuron or 
for a neuronal assembly. Neurons are modeled as point sources thereby greatly 
reducing the computational complexity and the mode of interaction across the 
neurons in an assembly m To overcome the time constraints existing in soft- 
ware simulation,several hardware implementations have been proposed. 
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The present day implementations in VLSI use the integrate and fire model 
[ED- The model is deterministic and does not take care of the inherent stochas- 
tic nature of the neuronal activities. Suggesting ASIC chips for every type of 
neuronal assembly would be cost prohibitive. 

To deal with hardware complexity and different classes of neuronal assembly 
including stochastic assembly models, a programmable hardware unit or in a 
wider sense an instruction driven general-purpose processor architecture is pro- 
posed in this paper. We adopt the model proposed by Gopinath Kallianpur 
as a thin linear cable upon which the Wiener process is imposed. This model is 
further extended in to a neuronal assembly, which is represented by coupled 
SPDEs. 

Based on the analysis of the computational structures of different neuronal as- 
sembly models including [E], a general-purpose NAM (Neuronal Assembly Mod- 
eling) processor architecture is proposed in this paper. The different floating 
point functional units of this processor are designed for evaluating complex tran- 
scendental functions, numerical techniques and for evaluating Wiener processes 
in space and time. The NAM processor needs very high memory processor band- 
width to match the performance of the functional units and also when dealing 
with massive neuronal assemblies. The architecture of the General purpose pro- 
cessors will not match with complex floating point functional units needed for 
a NAM processor. When we try to include these as special purpose modules to 
a general purpose architecture, the memory processor bandwidth would not be 
sufficient to match the computational power of these special purpose modules. 
The architectures of DSP processors are not tailored for solving complex partial 
differential equations and stochastic processes. 

Section two deals with the stochastic model of a single neuron and the neu- 
ronal assembly. Section three presents the architecture. The fourth section deals 
with the applications of the NAM processor 



2 A Stochastic Model for a Single Neuron 

There is a considerable amount of literature devoted to the implementation 
of neurons in VLSI. However most of these models utilized the integrate and 
fire principle. There is a need to consider the more complete model of spatial 
geometry of the neuron. The stochastic model as in is presented below both 
for single neuron and neuronal assembly. 

du d‘^'11 

— = —au + 0<x<b;t>0 (1) 

Where a and /3 are positive constants such that 

g(o,<) = g(M) = ov<>o 

u{x, 0) = Uq{x) 



(2) 
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To equation Dthe random neuronal activities is obtained by including the ef- 
fect of random impulses, which follow a generalized Poisson process. Accordingly 
the equation □is replaced by 

du ^d'^u 

- = -au + P^ + W{t,x), (3) 

Where W{t, x) is the space time gaussian white noise which is a fictitious deriva- 
tive of the space time wiener process W{t,x)t > 0,0 < a; < 6. Equation 0can be 
rewritten in terms of an Ito stochastic partial differential equation (SPDE). 

dX{x, = —LX{x, t] Lo)dt + dWtx{ui),X{0, x, oj) = uo{x) (4) 

By applying concepts of SPDE we can show the solution to equation E]is given 
as 

nt nt nb 

u{t,x,u;)= G{t,x,y)uo{y)dy + / G{t - s] x,y)dWsy{uj), (5) 

Jo Jo Jo 

The above single neuron model is extended to study the asymptotic behavior 
of the voltage potentials of interacting spatially distributed neurons. A rigorous 
treatment of the above equation is found in l^. To consider the parallel fiber in- 
teractions in a neuronal assembly, a simple and special case of Markin’s model is 
taken into account. The corresponding coupled pair of linear equations converted 
into SPDEs; is given as. 



dui 

Ik 

du2 

Ik 



= —aui + 



= —au2 — 



r + R d'^ui R d‘^U 2 

7C dx"^ 7C dx'^ 

^ d'^ui r + R d'^U 2 ^2 

7c dx"^ "fc dx^ 



( 6 ) 



where a = k/c,g = r(r -I- 2i?), and k,c,r,R are all positive constants. For R = 
0 the equations reduces to E]Finally the solution is given as 

00 

Ui{t, a:) = ^ A’^{t)y}n{x)G = 1, 2. (7) 

0 



Where for each i, = 1,2,... is a sequence of independent Ornstein Uh- 

lenbeck Processes given by 






with 






= e 

the various being independent wiener processes. 



ph ph 

I gi{x)(pn{x)dx + / 

0 Jo 



( 8 ) 
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Fig. 1. Architecture of the NAM Processor 

3 Processor Architecture 

A detailed analysis of the different neuronal models governed by ODE, PDE and 
SPDE shows their solutions involve evaluation of complex exponential functions 
and need the application of different numerical techniques to arrive at their solu- 
tions. In case of SPDEs additionally the Wiener process has to be implemented. 
The derivation presented in section 2 towards finding the solution of coupled 
SPDEs reveal the above-mentioned facts. The RHS of equation (Q can be split 
into two parts one for evaluating the complex exponential functions, application 
of appropriate numerical techniques for evaluating the integral and differential 
part and the stochastic component by the Wiener process. Hence the concept 
behind proposing a processor architecture for modeling complex neural assembly 
can be made more of a general purpose in nature applicable to a specific domain, 
biological neuronal modeling. 

The different functional units of this processor should be, one dedicated to- 
wards evaluating complex exponential functions, the second for application of 
numerical techniques and the third for evaluating Wiener processes in space 
and time. These techniques may include Simpson’s Method, Runge Kutta, Gill’s 
Method [p. Besides these we need a general-purpose arithmetic unit for per- 
forming intermediate processing needed for almost all numeric techniques. Also 
the general-purpose arithmetic unit helps in linking computationally the out- 
puts from exponential functional unit, Wiener process unit and the numeric 
function unit. Lastly, the ROM as a part of the Multiport memory space is used 
to store reciprocals of factorials and other kernel coefficients. The coefficients 
from ROM can be loaded from the DRAM and switched of to save power. The 
general-purpose processor architecture is shown in Fig. 1. 

The connectivity between neurons is extremely high(several 1000s in fact) 
[E0- This means that the concerned parameter variations are appreciable. Spa- 
tial variations along the soma surface are in the wider sub micron range. In 
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(A) 



(B) 



Fig. 2. A. Data flow across NFU,GPAU and GPRs for Gill’s Method B. Architecture 
of the NFU 



addition, potential variations across the neuron (Including dendrites) are also 
appreciable. The computations in the functional units are dominated with ir- 
rational and transcendental numbers. All these factors demand floating-point 
functional units. 

To achieve high performance the functional units should be capable of per- 
forming pipelined processing. The arithmetic units used in these functional units 
are multiple operand adder (MOA), Parallel Array Multiplier and Chain Multi- 
plier(CM). Carry Save Adder(CSA) tree Structures are employed to realize these 
pipelined functional units achieving a pipelining rate equal to a full adder delay 

[El- 

Having finalized the need for floating point pipelined functional units, the 
System Level Architecture should possess Instruction Level Pipelining to achieve 
overall ultra high performance. 

3.1 Architecture of the Numeric Function Unit (NFU) 

The analysis of different numeric techniques showed that most frequently used 
operations are of the type Multiple Operand Addition and Chain Multiplica- 
tion(More than two operands) [0. Fig 2A shows the data flow for Gill’s method 
across the GPAU, NFU and the GPRs. Fig 2B shows the functional architecture 
of the NFU. The MOA and the CM employed are pipelineable. Based on the 
type of numeric techniques applied either partial or complete pipelining may 
take place. As an example we show how the GILL’s method is implemented. 

Gill Algorithm Execution in NFU 

Input: yl, y2, y3... set of differential equations 
Procedure GILL 

Begin Set x=xO, yl= ylO, y2= y20, y3= y30; 

Set j=l 
While (x<x2) 

For 1=1 to N // Done in GPAU 
ki = f (x, yl, y2, y3. . . ) 

Compute C = (ki - bjqi) // done in the GPAU 
N = cj*ki 
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L = h*aj* (C) //Done In NFU 
M = 3* aj* (C) 
yi = yi + L 
qi = qi + M-N 
Store yi,qi in GPR 
Next I 

j = j+1 

If j = 5 Then 

If X >= x2 Then Stop; 

End if 

Else 

j =1 

End if 

End if 

If j = 3 Then 

X = x + h/2 

End if 
End While 
End Procedure 



3.2 Architecture of Exponential Unit (EFU) 

The architecture shown in the Fig. 3. performs pipelined evaluation of an arbi- 
trary power series corresponding to the value of the given input x. It is used to 
generate the values for the common exponential operations like massive expo- 
nential addition or massive exponential multiplication and other equations of the 
form Co -I- cix^ + C2X^ + -I- . . . where cq, ci, C2, C3 . . . are coefficients. The Fig. 

3 shows the pipelined architecture of the EFU. There are two multiplier stages, 
ml, m2, m3 pipelining along the X-axis and the other Ml, M2, M3 along the Y 
axis. Pipelining in the output CSA cluster also takes place along the X-axis. As 
an example when evaluating an exponential function upto nine terms, a pipelin- 
ing rate equal to five full adder stages can be obtained. This can be verified from 
the space time diagram shown in Fig. 4 and analyzing the CSA tree based array 
architecture of the multiplier m- 

The series is evaluated taking three terms at a time. The evaluation of 
as an example up to 6 terms is is given in the algorithm below. The constant 
coefficients are factorials and reciprocals or in general a scalar fetched from the 
SCR (Series Coefficient Register). These coefficients may be a part of the input 
or a set of intermediate results generated by the GPAU as a part of general 
processing. 

Algorithm used to calculate in EFU 

Input : X . 

Output: S, the value of e~x upto 6 terms Ml, M2, M3 are 

outputs from multipliers Ml, M2, M3 respectively Procedure EXP (x) 

For k=0 to 3 

Ml = x‘(k+l) 
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M2 = x*(k+2) 

M3 = x*(k+3) 
a = Ml / kl 
b = M2 / k2 
c = M3 / k3 
S = S + a + b + c 
k = k + 3 



Next 



// Achieved by the 3*2 Pipelined CSA tree 
// Achieved by the Feedback of x“(k+3) 




An Arbitrary Power Series 



Fig. 3. Pipelined Architecture of the EFU 



3.3 Architecture of Wiener Process Unit 

The Wiener process is described by a normal distribution with mean zero and 
variance one and its derivative gives the White noise spectrum. The normal 
distribution is of the form e — x^/2. To evolve an architecture the Wiener process 
is discretized as 

dw = \/diN{0, 1) 

The architecture has a random number generator and a multiplier. The random 
number generator is designed using the linear congruential method m which 
is given as 
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Fig. 4. Space time diagram of the EFU 
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Xn+i = (aXn + c)modm 

Where a is the constant multiplier,c is the increment, X q is initial value, 
the n+l^h random number and m the modulus. A functional level architecture 
is shown in the Fig. 5. There are other algorithms too for generating random 
numbers. One can design architecture for these algorithms and employ them as 
functional units [llhll tij . 

Procedure Wiener 
Begin 

dW(l) = sqrt (dt) *randn; 

W(l) = dW(l) ; 

/ / Done in WPU 

// Read from the coefficient registers the values of a,m,c 
WP = a*xn 
SP = WP + c 
Xn+1 = SP mod m 
Compute (Xn+1) “2 / 2 
Store in GPR xl 
/ / Done in EFU 
R = e-xl*2/2 

Generate the value for the given x 
/ / Done in WPU 
dw = 0.04472 * R 
Return result dw back to GPAU 

End 

End Procedure 



3.4 Memory and Register Architecture 

Modeling the neuronal assembly of the brain as a whole is just impossible. A par- 
titioned approach corresponding to the different regions of the brain is favored. 
The neuronal count even in regions like auditory cortex, cerebral cortex is quite 
enormous in hundred’s of thousands of neurons. This leads us to conclude that 
even with powerful processor architecture such as the NAM processor, trying to 
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Fig. 5. Architecture of the WPU 



simulate neuronal assembly of a particular region is an extremely difficult and 
time consuming process (Maybe days). Currently no integrated mathematical 
approach exists to deal with neuronal assembly involving 100s of neurons. 

Simulating the models of a realistic neuronal assembly demands an array 
processor. The node of this array could be the NAM processor. In this con- 
text, the processing capability needed of the NAM processor may get reduced 
to an assembly involving tens of neurons. With this background the memory 
needed may be in tens of megabytes. The exponential unit needs series coef- 
ficient registers for storing the appropriate coefficients relating to a particular 
series evaluation. At any given instant series are summed taking three terms at 
a time and this is evaluated for three cycles for generating nine term arbitrary 
power series. Hence the EFU can have upto a maximum of twelve 80-bit floating 
point registers according to IEEE standard. 

Similarly the NFU needs registers for storing the coefficients corresponding to 
a given numerical technique like Gill’s Method, Runge Kutta, Simpson’s method. 
Based on this it is decided to have sixteen 80 bit floating-point registers. There 
is a set of registers meant for general processing mostly used by the GPAU. Also, 
taking the constraint of the compiler into account and the addressing modes, it 
is fair enough to decide on 16 floating point and 16 fixed-point registers. 



3.5 Instruction Set Architecture 

The instruction set is a combination of general purpose and special purpose 
instructions. The general-purpose instruction set includes standard arithmetic; 
control and data transfer instructions under the standard format. 

The special purpose instruction set is grouped into exponential types, nu- 
meric types and Wiener types. Fig. 6. Shows the instruction format. The formats 
for the special purpose instructions are given below. 



G/S 


OPCODE 


SOURCE 


DESTINATION 



Fig. 6. Instruction Format for the NAM Processor 
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where G / S General or Special Purpose Instruction, Source, Destination : 
memory, registers; 

Addressing Modes: The individual functional units including GPAU are pro- 
vided with several registers to have more of register oriented data transfer in- 
stead of repeatedly accessing the memory. Such a register configuration can well 
support direct and indirect addressing modes. 




Fig. 7. General Purpose Application of the NAM Processor 



4 General Purpose Application of the NAM Processor 

In this paper we have presented the architecture of the NAM processor and 
the NAM instruction set. The Fig. 7 shown includes the NAM compiler and 
the NAM NL (Neuronal Language) . Because of the highly special purpose char- 
acteristics of the NAM Processor architecture it is essential to develop special 
language constructs to specify the complex mathematical model of any kind of 
neuronal assembly. If existing high-level languages were employed, the program- 
ming complexity would be very high. The neuron models could be point source, 
or linear or spatial, “integrate and fire” or sigmoid. The neuronal assemblies can 
belong to olfactory, auditory or vision cortexes. 

5 Conclusion 

A General-purpose processor architecture is proposed for simulating different 
types of neuronal assembly. This has a definite advantage over an ASIG de- 
sign targeting a particular model and particular type of neuronal assembly. The 
instruction set proposed has both general purpose and special purpose instruc- 
tions. 

For modeling very large neural assembly, one has to evolve a mathematical 
approach for solving large number of coupled stochastic partial differential equa- 
tions. As of now there are no such mathematical models available. If such models 
are proposed, it may involve a matrix based solving. To tackle this matrix pro- 
cessing of very high order. The NAM processor may need additional functional 
units like a 4*4 matrix multiplier or a 4*4 matrix inverter and other matrix 
oriented operations. 

A major impact of this proposed NAM processor is towards ultimate mod- 
eling of the complex functionalities of the brain with fault simulations and de- 
signing evolvable systems like brainy processors m- This NAM processor could 
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be employed by biological neuronal researchers in understanding the mystery of 
the brain. 
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Abstract. This paper presents a proposal based on binary particle 
swarm optimization to design combinational logic circuits at the gate- 
level. The proposed algorithm is validated using several examples from 
the literature, and is compared against a genetic algorithm (with inte- 
ger representation), and against human designers who used traditional 
circuit design aids (e.g., Karnaugh Maps). Results indicate that particle 
swarm optimization may be a viable alternative to design combinational 
circuits at the gate-level. 



1 Introduction 

Kennedy & Eberhart [6] proposed an approach called “particle swarm optimiza- 
tion” (PSO) which was inspired on the choreography of a bird flock. The idea of 
this approach is to simulate the movements of a group (or population) of birds 
which aim to find food. The approach can be seen as a distributed behavioral 
algorithm that performs (in its more general version) multidimensional search. 
In the simulation, the behavior of each individual is affected by either the best 
local (i.e., within a certain neighborhood) or the best global individual. The ap- 
proach uses then the concept of population and a measure of performance similar 
to the fitness value used with evolutionary algorithms. Also, the adjustments of 
individuals are analogous to the use of a crossover operator. However, this ap- 
proach introduces the use of flying potential solutions through hyperspace (used 
to accelerate convergence) which does not seem to have an analogous mechanism 
in traditional evolutionary algorithms. Another important difference is the fact 
that PSO allows individuals to benefit from their past experiences whereas in an 
evolutionary algorithm, normally the current population is the only “memory” 
used by the individuals. PSO has been successfully used for both continuous 
nonlinear and discrete binary optimization [4, 6-8] . 

As far as we know, this paper presents the first attempt to use PSO to design 
combinational circuits. 



A.M. Tyrrell, P.C. Haddow, and J. Torresen (Eds.): ICES 2003, LNCS 2606, pp. 398—409, 2003. 
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Fig. 1. Matrix used to represent a circuit. Each gate gets its inputs from either of 
the gates in the previous column. Note the encoding adopted for each element of the 
matrix as well as the set of available gates used. 



2 Problem Statement 

We used the same matrix representation to encode a circuit as in some of our 
previous work [1,2]. Such representation is shown in Figure 1. This matrix is 
encoded as a fixed- length string of bits or integers from 0 to fV — 1, where N refers 
to the number of rows allowed in the matrix (we call it n-cardinality alphabet). 
In this paper, we will be referring to our GA that uses an n-cardinality alphabet, 
since we have found in the past that this version of the algorithm consistently 
produces better results than its binary counterpart [2]. 

More formally, we can say that any circuit can be represented as a bidimen- 
sional array of gates Sij, where j indicates the level of a gate, so that those gates 
closer to the inputs have lower values of j. (Level values are incremented from 
left to right in Figure 1). For a fixed j, the index i varies with respect to the 
gates that are “next” to each other in the circuit, but without being necessarily 
connected. Each matrix element is a gate (there are 5 types of gates: AND, NOT, 
OR, XOR and WIRE^) that receives its 2 inputs from any gate at the previous 
column as shown in Figure 1. Although our implementation allows gates with 
more inputs and these inputs might come from any previous level of the circuit, 
we limited ourselves to 2-input gates and restricted the inputs to come only from 
the previous level. This restriction could, of course, be relaxed, but we adopted 
it to allow a fair comparison with our previous GA-based approach. 

^ WIRE basically indicates a null operation, or in other words, the absence of gate, 
and it is used just to keep regularity in the representation used, since otherwise 
would have to use variable-length strings. 
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Input 1 



Input 2 



Gate Type 



Fig. 2. Encoding used for each of the matrix elements that represent a circuit. 



A chromosomic string encodes the matrix shown in Figure 1 by using triplets 
in which the 2 first elements refer to each of the inputs used, and the third is 
the corresponding gate from the available set. 

The matrix representation adopted in this work was originally proposed by 
Louis [9, 10]. He applied his approach to a 2-bit adder and to the n-parity check 
problem (for n = 4,5,6). This representation has also been adopted by Miller 
et al. [11, 12] with some differences. For example, the restrictions regarding the 
source of a certain input to be fed in a matrix element varies in each of the three 
approaches: Louis [9] has strong restrictions, Miller et al. [11] have no restrictions 
and we have relatively light restrictions. The encoding is also different in all cases. 
Louis [9] only encoded information regarding one input and the type of gate to be 
used at each matrix position. He also used binary representation. In our case, we 
have used both an n-cardinality alphabet and a binary alphabet and we encode 
the gate to be placed at each matrix location plus its two inputs. Miller et al. 
[11] encode a full Boolean operation using a single integer. This representation 
is more compact, but it has the problem of requiring that mutation takes the 
place of crossover to introduce enough diversity in the population, so that the 
evolutionary algorithm can approach the feasible region. 

Finally, the last difference among the three approaches previously mentioned 
is regarding the fitness function. Louis [9] simply maximizes the number of 
matches between the outputs produced by the circuit and those indicated in 
the truth table. We have used a fitness function that works in two stages: first, 
it maximizes the number of matches (as in Louis’ case). However, once feasible 
solutions are found, we maximize the number of WIREs in the circuit. By doing 
this, we actually optimize the circuit in terms of the number of gates that it 
uses. Miller et al. [11] did something similar to Louis until recently (they have 
recently introduced a two-stage fitness function like the one adopted by us [5]). 

Thus, we can say that our goal is to produce a fully functional design (i.e., one 
that produces all the expected outputs for any combination of inputs according 
to the truth table given for the problem) which maximizes the number of WIREs. 

3 Description of Our Approach 

The main motivation for using particle swarm optimization (PSO) to design 
combinational circuits is that this algorithm has been found to be very efficient 
in a variety of tasks [8] . Note however, that most of the successful uses of PSO 
reported in the literature deal with real numbers representations and in this 
case, we will be using a binary encoding. Although a real numbers represen- 
tation is possible (and in fact, we are currently working on such a version of 
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the algorithm), the preliminary results reported in this paper were found with a 
binary representation that worked reasonably well. The use of real numbers to 
represent a circuit requires a more sophisticated genotype-phenotype mapping. 
Next, we will describe the details of our implementation. 



1. For i = 1 to M (M = population size) 

Initialize P[i] randomly 

(P is the population of particles) 

Initialize V[i] = 0 {V = speed of each particle) 

Evaluate P[i] 

GBEST = Best particle found in P[i] 

2. End For 

3. For i = 1 to M 

PBESTS\i] = P\i] 

(Initialize the “memory” of each particle) 

4. End For 

5. Repeat 

For i = 1 to M 

V\i] = V[i - 1] -b <^1 X {PBESTS\i] - P[t\) 

+$2 X {PBESTS[GBEST] - P[i\) 

(Calculate speed of each particle) 

(^1 and ^2 are upper limits used to 

draw positive random numbers from a uniform distribution 
POP\i] = P[i\ + V\i] 

If a particle gets outside the pre-defined hypercube 
then it is reintegrated to its boundaries 
Evaluate P[i\ 

If new position is better then PBESTS[i\ = P[i\ 

GBEST = Best particle found in P[i] 

End For 

6. Until stopping condition is reached 



Fig. 3. Particle swarm optimization pseudocode 



The general algorithm of binary PSO is shown in Figure 3. In PSO, there are 
two types of information available for the particles so that they can make the best 
decision regarding where to move next. One of these is its own search experience 
(i.e., a particle has passed through several states and it “knows” which of them 
has been the best so far). Additionally, it also knows about the performance of 
the particles in its neighborhood (i.e., it knows which are the best states that 
its neighbors have reached so far). These two pieces of information correspond 
to the individual learning and the cultural transmission, respectively [8]. 
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Mathematically speaking, the binary version of PSO is defined such that the 
probability of an individual’s deciding zero or one (i.e., false or true) is [7]: 

P{Xid{t) = 1) = f{Xid{t - l),Vid{t - l),Pid,Pgd) (1) 



where: 

^ P{xid{t) = 1) is the probability that individual i will choose 1 for the bit at 
the d-th position of the binary string. 

— Xid{t) is the current state of the string position d of individual i. 

— t refers to the current iteration. 

“ Xid{t~^) is a measure of the individual’s predisposition or current probability 
of deciding 1. 

— Pid is the best state found so far. 

— Pgd is the best state found in the neighborhood so far. 

Although the main adjustment expression used by PSO can be seen as a form 
of mutation, we found out that its explorative power was not enough in circuit 
design. Therefore, we added a uniform mutation operator such as the one used 
with traditional genetic algorithms. This operator, however, was only applied 
to a certain percentage of the population (this is a parameter defined by the 
user). From our experiments, we determined that a value between 1% and 3% 
was appropriate to setup the percentage of the population subject to mutation. 

Table 1. Truth table for the circuit of the first example. 
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4 Results 

We used several examples taken from the literature to test our AS implementa- 
tion. Our results were compared to those obtained by two human designers and 
a genetic algorithm with an n-cardinality representation (see [2] for details) . 



4.1 Example 1 

Our first example has 3 inputs and 1 output ant its truth table is shown in 
Table 1. In this case, the matrix used was of size 5x5, and the length of 
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Table 2. Comparison of results between our PSO algorithm, the n-cardinality GA 
(NGA), and two human designers for the circuit of the first example. 



NGA 


Human Designer 1 


F = Z{X + Y) 


F ^ Z{X®Y) + Y{X®Z) 


4 gates 


5 gates 


2 ANDs, 1 OR, 1 XOR 


2 ANDs, 1 OR, 2 XORs 


PSO 


Human Designer 2 


F = ({Y + Z)X)®YZ 


F = X'YZ + X(Y S> Z) 


4 gates 


6 gates 


2 ANDs, 1 OR, 1 XOR 3 ANDs, 1 OR, 1 XOR, 1 NOT 




Fig. 4. Graphical representation of the best circuit found by our PSO algorithm for 
the first example. 



each string representing a circuit was 75. Since 5 gates were allowed in each 
matrix position, then the size of the intrinsic search space (i.e., the maximum 
size allowed as a consequence of the representation used) for this problem is 5*, 
where I refers to the length required to represent a circuit {I = 75 in our case). 
Thefore, the size of the intrinsic search space is 5"^® « 2.6 x 10^^. Fitness is 
computed in the following way: 8 (number of outputs that we must match to 
have a feasible circuit) + 5x5 (size of the matrix) - number of gates used (i.e., 
different of WIRE)). Therefore, a fitness of 29 (the best value produced for this 
circuit) means that the circuit is feasible (otherwise, its fitness could not possibly 
be above 8), and that it has 4 gates (i.e., 21 WIREs), because 8 + (25-4) = 8 + 
21 = 29. 

The graphical representation of the best circuit produced by PSO is shown in 
Fig. 4. Our PSO algorithm found this solution using the following parameters^: 
90 particles, 300 iterations (i.e., 27,000 evaluations of the objective function were 
required), = ^2 = 0.8, Vmax = 3.0, Pm = 1%. The parameters used by the 
NGA were the following: crossover rate = 0.5, mutation rate = 0.5/75 = 0.0022, 
population size = 90, maximum number of generations = 300. 

The comparison of the results produced by PSO, the NGA and two human 
designers are shown in Table 2. In this case, human designer 1 used Karnaugh 



^ These parameters were empirically derived. 
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Maps plus Boolean algebra identities to simplify the circuit, whereas human 
designer 2 used the Quine-McCluskey Procedure. 

PSO produced feasible circuits 100% of the time and it found the optimum 
in 7 out of 20 runs performed (i.e., 35% of the time), reaching a fitness of 28 in 
all the other runs. The average fitness of the 20 runs performed was 28.35, with 
a standard deviation of 0.49. The graphical representation of the best solution 
found by PSO is depicted in Figure 4. 

On the other hand, the best solution that the NGA could find using the 
same population size had also a fitness of 29 (i.e., a circuit with 4 gates), but it 
appeared only 10% of the time. Also, 20% of the time, the best solution found 
by the NGA was infeasible. The average fitness of these 20 runs was 21.4, with a 
standard deviation of 8.438009244. In this example, our PSO algorithm showed 
a better performance (on average) than the NGA. 

Table 3. Truth table for the circuit of the second example. 
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4.2 Example 2 

Our second example has 4 inputs and one output, as shown in Table 3. In 
this case, the matrix used was also of size 5x5. Our PSO algorithm used the 
following parameters: 200 particles, 1000 iterations (i.e., 200,000 evaluations of 
the objective function were required), = ^2 = 0.8, Vmax = 3.0, Pm = 3%. The 
parameters used by the NGA were the following: crossover rate = 0.5, mutation 
rate = 0.0022, population size = 200, maximum number of generations = 1000. 

The comparison of the results produced by PSO, an n-cardinality GA (NGA), 
a human designer (using Karnaugh maps), and Sasao’s approach [13] are shown 
in Table 4. Sasao has used this circuit to illustrate his circuit simplification 
technique based on the use of ANDs & XORs. His solution uses, however, more 
gates than the circuit produced by our approach. 
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Fig. 5. Graphical representation of the best circuit found by PSO for the second ex- 
ample. 



Table 4. Comparison of results between PSO, an n-cardinality GA (NGA), a human 
designer and Sasao’s approach for the circuit of the second example 

NGA 

F = {WYX' ®{{W + Y)®Z®{X + Y + Z))Y 

10 gates 

2 ANDs, 3 ORs, 3 XORs, 2 NOTs 

Human Designer 1 

F = {{Z'X) © iY'W')) + {{X'Y){Z © W')) 

11 gates 

4 ANDs, 1 OR, 2 XORs, 4 NOTs 

PSO 

F = {XY + W)S) {{Z © X){X + Y)Y 
7 gates 

2 ANDs, 2 ORs, 2 XORs, 1 NOT 
Sasao 

F = A' © Y'W © XY'Z' © X'Y'W 

12 gates 

3 XORs, 5 ANDs, 4 NOTs 



Our PSO algorithm found a solution with a fitness value of 34 (i.e., a circuit 
with 7 gates) 20% of the time, and feasible circuits were found 67% of the time. 
The average fitness of the 20 runs performed was 29.35, with a standard deviation 
of 7.4. The graphical representation of the best solution found is depicted in 
Figure 5. 

The best solution that the NGA could find using the same population size 
had a fitness of 31 (i.e., a circuit with 10 gates), and it appeared only 20% of the 
time. Also, 65% of the time, the best solution found by the NGA was infeasible. 
The average fitness of these 20 runs was 20.25, with a standard deviation of 
7.68. In this example, our PSO algorithm showed a better performance than the 
NGA. 
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Table 5. Truth table for the circuit of the third example. 
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4.3 Example 3 

Our third example is a two-bit adder (4 inputs and 3 outputs), and its truth 
table is shown in Table 5. The matrix used in this case was again of size 5x5. 
Our PSO algorithm used the following parameters: 300 particles, 2000 iterations 
(i.e., 600,000 evaluations of the objective function were required), = <?2 = 0-8, 
Vmax = 3.0, Pm = !%• The parameters used by the NGA were the following: 
crossover rate = 0.5, mutation rate = 0.0022, population size = 300, maximum 
number of generations = 2000. 

The comparison of the results produced by PSO, an n-cardinality GA (NGA), 
and one human designer (using Karnaugh maps) are shown in Table 6. 

Our PSO algorithm found a solution with a fitness value of 66 (i.e., a circuit 
with 7 gates) but only once in the 20 runs performed. Feasible circuits were 
found only 20% of the time. The average fitness of the 20 runs performed was 
48.85, with a standard deviation of 6.82. The graphical representation of the 
best solution found by our PSO algorithm is depicted in Figure 6. 

The best solution that the NGA could find using the same number of fitness 
function evaluations had a fitness of 66 (i.e., a feasible circuit with 7 gates). This 
solution also appeared only once in the 20 runs performed. However, the NGA 
could find feasible solutions 75% of the time. The average fitness of these 20 runs 
was 58.2, with a standard deviation of 7.17. It can be clearly seen that in this 
example the NGA performed better (on average) than our PSO algorithm. 



5 Conclusions and Future Work 

We have presented the first formal proposal to use binary particle swarm op- 
timization for designing combinational logic circuits. The approach presented 
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Table 6. Comparison of results between our PSO algorithm, an n-cardinality GA 
(NGA), and one human designer for the circuit of the third example. 



NGA 

Ti = B © D 
B 2 = (A © C) © BD 
F3 = AC + BD{A®C) 

7 gates 

2 ANDs, 1 OR, 4 XORs 

Human Designer 1 

Fi = A © D 

F 2 = (A © C)D' + ((A © G) © B)D 
= AC + BD{A + C) 

12 gates 

5 ANDs, 3 ORs, 3 XORs, 1 NOT 

PSO 

Ti = B © D 
F 2 = {BD) © (A © G) 

B 3 = (AG) + ((BB)(A©G)) 

7 gates 

3 XORs, 3 ANDs, 1 OR 



seems promising, since it produced competitive (and in some cases better) results 
with respect to an n-cardinality genetic algorithm (except for the last example) 
and it consistently outperformed the solutions produced by human designers. In 
fact, our experiments^ indicate that our PSO is very competitive with circuits 
that have only one output, but the approach is less robust when dealing with 
multiple-outputs circuits. 

One of the current limitations of our approach is the exploratory power of 
the algorithm which is still not as good as we expected. This is more evident 
in circuits with several outputs such as the two-bit adder in which our PSO 
algorithm had a poorer performance than the NGA (on average). 

As part of our future work, we are planning to introduce a population-based 
approach such as the one proposed in [3] to improve the search capabilities of 
our algorithm. We are also working on an indirect representation that allows us 
to use real numbers to represent circuits. We hypothesize that PSO will have a 
better performance if we can manage to produce a real numbers representation, 
since there is plenty of evidence of such positive behavior in the literature [8] . 

Acknowledgements 

The first author acknowledges support from CONACyT through the NSF-CO- 
NACyT project number 32999-A. The second author acknowledges support from 

® There are several more examples available which were not included due to space 
limitations. 




408 



Carlos A. Coello Coello et al. 




Fig. 6. Graphical representation of the best circuit found by our PSO algorithm for 
the third example. 
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Abstract. We present a systematic procedure to synthesise and min- 
imise digital circuits using propositional satisfiability. After encoding the 
truth table into a canonical sum of at most k different products, we seek 
its minimal satisfiable representation. We show how to use an interesting 
local search landscape for this minimisation. This approach can be very 
useful since we can generate exact minimal solutions within reasonably 
computational resources. 



1 Introduction 

This note reviews the synthesis and minimisation of digital circuits using proposi- 
tional satisfiability (SAT). Here we shall use a SAT encoding for circuit synthesis, 
and an incomplete randomised SAT solver for minimisation. That is, we shall 
translate the synthesis as a boolean decision problem, and minimisation as find- 
ing satisfiability on different encodings. Although the encoding of a truth table 
into a logical formula is not a new issue [7, 8], it has been largely unknown in the 
circuit design community. Regrettably, they show an over estimation of clauses 
and variables, also, perhaps it had little impact because they lack a detailed 
analysis of suitable SAT solvers for this encoding. 

Here we introduce a corrected estimation for variables and clauses, as well 
as to point out a useful behaviour in the SAT solver that allow us to estimate 
minimal circuits. By providing both pieces we can thus overcome some of their 
initial inconveniences and reintroduce it as an interesting comparison point. By 
using this encoding and solving procedure one can generate minimal circuits in 
terms of products in canonical form. Notice that, the minimisation to k terms 
does not correspond to the minimal number of gates, but to the minimal fc-levels 
of logic on Programmable Logic Arrays. 

The rest of this paper is organised as follows. Firstly, we introduce some 
background on SAT, and the transformation used here. Secondly, we explain 
how it can be used to extract the minimal number of products. Thirdly, we show 
an interesting behaviour of a randomised SAT solver on these encodings. Finally 
we give some results, comparisons, and conclusions for this note. 

2 Satisfiability and Transformations 

The propositional satisfiability problem, or SAT for short, is a key issue in the- 
ory of computing. Gonceptually, a SAT formula is an ensemble of m clauses in 



A.M. Tyrrell, P.C. Haddow, and J. Torresen (Eds.): ICES 2003, LNCS 2606, pp. 410—421, 2003. 
(c) Springer- Verlag Berlin Heidelberg 2003 




A Note on Designing Logical Circuits Using SAT 411 



conjuntive normal form, Where every Ci is a disjunction of at least one 

binary variable, or its negation, from the set a;i, . . . , Indeed, the question is 
whether or not exists a truth assignment that satisfies all clauses. In fact every 
clause can be seen as one restriction to handle, and overall satisfaction is reached 
whenever all clauses are satisfied. This search space is therefore at most 0(2”) 
in a naive approach, or 0(1.3302") by using wise randomisation [5]. 

Due to this exponential worst case, endeavours in SAT research has centred 
around two foci, complete and incomplete solvers. Roughly speaking, complete 
procedures speed-up the inference by applying a set of rules, early conflict de- 
tection, etc. These procedures, regrettably, still show painful exponential growth 
in the number of variables. 

Meanwhile, incomplete search procedures heuristically set and refine truth 
assignments. In this row, most of the incomplete solvers are based in randomised 
local search. These ideas of heuristic search usually produce very fast results, 
but nevertheless they are incomplete on whether the formulae is unsatisfied at 
all. Despite that fact, we should remember one useful property of Monte Carlo 
simulations, if the algorithm is run repeatedly with independent random choices 
each time, the failure probability can be made arbitrarily small, regardless of 
the input. That is why is common to have restarts in many search procedures. 

Here we use a well-known and simple but incomplete algorithm, WalkSAT 
[12]. As we shall see later, WalkSAT does a greedy variable selection, and, with 
some small probability p, flips a variable from an unsatisfied clause, even if this 
variable does not gain too much in comparison to others. 



2.1 Encodings 

The NP-completeness concept, which is based on the Cook-Levin work, deals 
with the idea of polynomial transformation from a problem Pi to Pj . Notice that 
such transformations must preserve the essential result, i.e. should Pi returns 
“yes”, also Pj must return “yes” under the same problem input. 

Among all possible transformations, we shall use one due to Kamath et al. 
[7,8]. They show how to transform a truth table (n inputs, m outputs) in its 
equivalent boolean decision problem: is it possible to build a circuit with at most 
k products? The encoding maps any truth table to logic gates AND, OR, and 
NOT. Optimal circuits can be found by minimising free parameter k, which is 
the number of different products in canonical form used to represent the circuit. 
By moving k one may therefore ask for a valid assignment to the formulae Aj, 
up till finding satisfiability. Once a satisfaction has been found, we may decode 
it to produce a valid and fully functional circuit. This mapping will help us to 
construct minimal combinatorial circuits in terms of minterms. 

Whilst this circuit synthesis technique has been used in a couple of studies 
on SAT, it has been largely unknown outside that world, even for the circuit 
design community. We think such fact is perhaps due to an over estimation of 
variables and clauses, or maybe because their example simply does not follow 
from their equations. 
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Here we present the formulation, a reader interested in reading the original 
derivation should refer to the corresponding article [8] . We shall assume a given 
truth table T, and a parameter k for the wished number of different minterms. 
From T we identify A entries or rows as inputs and outputs, xf and respec- 
tively, where upper index represents the row in T and lower index the column. 
We also need a set of pairs {a, q} such that the output has a value “true”, as 
well as the {a,q} for “false”, denoted as and respectively. To follow the 
same notation in [7, 8] , we use auxiliary variables a as: 






s'j ii xf = I 
Sji if xf = 0 



Next five sections corresponds to the final formulae Ek'. 
n X k clauses type 1: 



For i = 1 to n 
For j = I 



Sji V s 



ji 



to k 



|F°| X k clauses type 2: 



F" or j = 1 to k 

For {a, q} s.t. 2/^=0 

Wjq'^ (VLl^jfc) 



|F^| clauses type 3: 



For{a,q} s.t. = 1 

y j=l Zjq 



\Y^ \ X k clauses type 4a: 



For j = 1 to k 

For {a,g} s.t. j/“ = 1 



Zjq V Wjq 



n X k X |y^| clauses 4b: 



For i = 1 to n 

For J = 1 to k 

For {a, q} s.t. = 1 

V (T^- 
jq 3'^ 

As can be seen, variables (w) and clauses (c) grow^ as: 

V = k X {2n + m+ 

c=kx{n + n\Y^\ + \Y^\ + \Y°\) + \Y^\ 



( 1 ) 

(2) 



^ Previous expressions were k x (2n-|-m(l -|- A)) variables, and k x (n -|- 1 | ) -I- |y^ |(1 -I- 
k{2 -I- n)) clauses. 
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2.2 An example 

We shall describe the complete formulae for the following small truth table: 



xi X2 X3 


to 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


1 


0 


1 


1 


0 


1 


1 


0 


0 


1 


0 


0 


0 


1 


1 


0 


1 


1 


1 


1 


1 


0 


1 


1 


1 


1 


1 


1 


0 



For this specific example we have 3 inputs (n), two outputs (m), and we set k 
to three, then: 







|ri| = 8 
|r°| = 8 



The following formulae is derived from the equations: 



Type 1 
spi V s'l 1 

52.1 V s'24 

53.1 V s'3,1 

51.2 V s'i 2 

52.2 V S 2.2 

5 3. 2 V S 3 2 

5 1.3 V S'l 3 

52.3 V S 2 3 

■53.3 V S 3 3 



Type 2 

thip Vsip Vsi^2 V si_3 

tCl.2VSpiVSp2VSi,3 

tCl,! V Sip V Sip V s'lp 
Wl ,2 V Sip V Sip V S3 3 
tCl,! V Sip V s'l 2 V s'l 3 

'F'1,2 V Sip V s'lp V s'lp 
^1.1 Vs'ipVsipVsip 

XCip VSii\/Si2VSi3 

tt'2,1 V S2p V S2p V S2P 
tZ'2,2 V S2,l V S2,2 V S 2,3 
W 2 ,l V S2,l V S2,2 V S2 3 
tZ'2,2 V S2,l V S2,2 V S2 3 
W 2 ,l V S2,l V s'2 2 V S2P 
W 2,2 V S2P V s'2 2 V S2P 
tt'2,1 V s'21 V S2P V S2P 
W 2,2 V S21 V S2 2 V S2 3 
tC 3 ,l V S 3 P V S 3 P V S 3 P 
tC 3,2 V S 3 P V S 3 P V S 3 P 
^ 3,1 V S 3 P V S 3 P V Sgp 
tC 3,2 V S 3 P V S 3 P V S3 3 

W 3 .I VS 3 p VSgpVSgp 
W3.2 V S3P V S3 2 V Sgp 
^ 3,1 V S31 V S 3 P V S 3 P 
tt'3,2 V S31 V S3 2 V S3 3 



Type 3 

^ 1 , 1 '^ ^ 2 , 1 ^ 3,1 
Zip V Z22 V Z32 
Zip V z|p V z|p 
■^ 1 , 1 '^^ 2 , 1 '^ ^ 3,1 
Zip V zfp V Z32 
■ 2 ^ 1 , 1 '^ ^ 2,1 V 2:3 1 
Zip V z^pV Z32 
^ 1 , 1 ^ ^ 2,1 V 2:3 1 



Type 4a 

Zip Vwip 
zfpVwip 
zfpVwip 
^1,1 Vwi,i 
zfpVwip 
Zip Vwip 
zJpVwip 
Zip Vwip 

z|i VW2,1 
z| 2 V W2,2 
Z2pVW2,2 
zfpV W2,l 
Z2pVW2,2 
4 p VW2p 
Z 2.2 V tC2,2 
zfp VW2,1 

^3,1 VW3,1 
^3,2'^ W3, 2 
^3,2 V W3,2 
lVW3,l 
^3,2VW3,2 

4.1 VW3,1 

4.2 V W3,2 
4,1 Vlf3,l 
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Type 4b 



^1.1 


V Sip 


^3,2 


V 


■Ss,! 


2 ® 

^2,2 


V 


S 2.2 


2^ 

^1,2 


V 


Sl,3 


^1.2 


V Sip 


Z^ 

^3,1 


V 


S3,l 


2^ 

^2,1 


V 


sj .2 


2^ 

^1,1 


V 


sj,3 


^1.2 


V sjp 


7^ 

^3,2 


V 


S3,l 


2 '^ 

^2,2 


V 


sj .2 


^3 

^2,1 


V 


S2,3 


^1.1 


V sjp 


^3,1 


V 


S3,l 


2® 

^2,1 


V 


sj .2 


^3 

^2,2 


V 


S2,3 


7^ 

^1.2 


V sjp 


^3,2 


V 


■Ss,! 


^3,1 


V 


S 3, 2 


2^ 

^2,2 


V 


S2,3 


^1.1 


V sjp 


^3,1 


V 


S3,l 


^3,2 


V 


S 3, 2 


2^ 

^2,1 


V 


sj,3 


^1.2 


V sjp 


^1.1 


V 


■sjp 


2^ 

^3,2 


V 


S 3, 2 


2^ 

^2,2 


V 


sj,3 




V sjp 


^1,2 


V 


■sjp 


2® 

^3,1 


V 


S 3, 2 


2^ 

^2,1 


V 


S2,3 


7^ 

^2.1 


VS2p 


^1.2 


V 


Si. 2 


2® 

^3,2 


V 


S 3, 2 


^2,2 


V 


S2,3 


^2.2 


VS2p 


7^ 

^1.1 


V 


Si. 2 


2 '^ 

^3,1 


V 


S 3, 2 


2^ 

^2,1 


V 


sj,3 


7^' 

^2.2 


V sjp 


7^' 

^1.2 


V 


Si. 2 


2 '^ 

^3,2 


V 


S 3, 2 


^3 

^3,1 


V 


S3,3 


7^ 

^2.1 


V sj,i 


2^' 

^1.1 


V 


'Sj.2 


2® 

^3,1 


V 


S 3, 2 


2^ 

^3,2 


V 


S3,3 


7^’ 

^2.2 


V sj,i 


2^' 

^1.2 


V 


'Sj.2 


2^ 

^1,1 


V 


Si, 3 


2^ 

^3,2 


V 


S3,3 


^2.1 


V sjp 


2^’ 

^1.1 


V 


sj .2 


^1,2 


V 


SI .3 


^3,1 


V 


sj,3 


^2.2 


V sjp 


^2.1 


V 


sj .2 


^1,2 


V 


SI .3 


^3,2 


V 


sj,3 


^2.1 


V sjp 


23' 

^2.2 


V 


sj .2 


^1,1 


V 


sj,3 


2^ 

^3,1 


V 


S3,3 


^3,1 


VS3p 


23' 

^2,2 


V 


S2.2 


2® 

^1,2 


V 


sj,3 


2^ 

^3,2 


V 


S3,3 


z^ 

^3,2 


V S3P 


2® 

^2,1 


V 


S2.2 


^1,1 


V 


Si, 3 


2^ 

^3,1 


V 


S 3 , 3 



After examing the formulae or applying the equations (1) and (2), we see it 
has 48 variables, and 137 clauses. Although the encoding is not exponential in 
the number of variables, we may not need to find a minimal k, say k*, by testing 
all Ek, from fc = 1 to A. Instead, we can apply a bisection algorithm between 
boundaries 1 < A: < |y^|. Convergence is assured since there is one and only one 
root, that is, the encoding is unsatisfiable in 1 < fc < A:*, and satisfiable from 
k > k*. Say, the k* is exactly the turning point or root between unsat to sat. 
We arrive therefore to the minimal solution in at most log 2 \Y^ \ — 1 steps. 



3 Decoding the Circuit 

After introducing the formulae to a SAT solver, we found that it is satisfiable, 
thus values of some variables should be examinated. In order to show the final 
circuit, we must inspect the values of s, s', and w. First two sets are useful to 
determine if a literal Xi is part of the j-th product: 

jo if Xi is in the j-th term 
j 1 otherwise 



s 



ji 



0 if Xi is in the j-th term 

1 otherwise 



The next table shows a valid assignment for the s and s' set: 
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Assignment 


Meaning 


Sl ,2 = 0 


add X 2 to Pi 


•®1,3 = ^ 


add Sato pi 


S 24 = 0 


add xi to p 2 


S2,3 = 0 


add 3:3 to p 2 


S 34 = 0 


add a;i to pa 


■® 3,2 = 0 


add a; 2 to pa 


Sl,l = 1 


do nothing 


•®ui = 1 


do nothing 


S 2,2 = 1 


do nothing 


® 2,2 = 1 


do nothing 


S 3.3 = 1 


do nothing 


®3.3 = 1 


do nothing 



Therefore, we construct three minterms: pi = X 2 X 3 , p 2 = X 1 X 3 , and p^ = xiX 2 - 
Now the problem reduces to find out which product corresponds to the Uoutput. 
This is done by looking for positive values of w: 

j 1 if j-th minterm appears in the Uth output 
0 otherwise 

Next table shows a valid assignment for the w set: 



Assignment 


Meaning 


Wis = 1 


add Pi to Pi 


W 2 S = 1 


add p 2 to Pi 


Wl,2 = 1 


add Pi to p 2 


W3,2 = 1 


add Pa to p 2 


W2,2 = 0 


do nothing 


W 3.1 = 0 


do nothing 



Finally, from the last table we finally rewrite yi and j /2 as: 



yi = X 2 X 3 +X 1 X 3 



J/2 = X2X3+X1X2 

This is the minimal solution, in canonical sum, for the truth table given 
above. Figure 1 depicts this circuit in logic array format. 



4 Local Search Landscape 

We have seen how to transform a truth table and decode its solution to produce 
valid circuits. Now the critical point is the SAT solver algorithm. Although we 
shall try to avoid an endless list of papers on this area, we should point out 
that simple randomised strategies plus random walk have been very successful 
for propositional satisfiability, see e.g. [5, 11]. 
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Fig. 1. Logic array representation {k* = 3) for the example. 



GSAT [11] is a randomised greedy search, where we flip and set boolean 
variables accordingly to its gain. It starts at some initial state (randomised initial 
assignment) and performs a random walk, by flipping some variables with some 
probability. We select thus the variable which produces the largest increase in 
the number of satisfied clauses. The probability to flip one variable is normally 
assumed to be however this is not mandatory since we can set the probability 
distribution depending on the input formula [5]. Moreover, with some probability 
we flip one variable at random, even if such a variable does not satisfy many 
clauses. GSAT is one technique from a countless mass of successful application 
of randomised search heuristics. 

It should be clear that such a randomised procedure does not confirm that 
a formulae is unsatisflable at all. However, here we show how the local search 
behaviour may indicate, or at least give a clue, whether or not it is satisflable. 



4.1 Computing effort 

Figure (2) depicts the local search landscape for a three bits multiplier. We set 
a cutoff, in the random walk, of 1000 times the number of variables, and two 
restarts. From the figure we can see the sharp slump down in the local search 
effort after just crossing the satisfiability barrer. That is, such encoding is unsat 
from 1 to 29 products, meanwhile is certainly sat from 30. We may thus see that, 
although it is an incomplete solver, its behaviour is very convenient to identify 
the satisfiability threshold. 

It is also interesting to note the asymptotic tail. Once we have found a satis- 
flable Ek, every successive fc -I- 1 has more and more feasible solutions, which in 
turn helps to the search algorithm to finish faster, i.e. less computing effort. This 
dramatic slump down in the search for a valid assignment has been observed in 
every circuit we designed. Unfortunately, such a nice behaviour does not appear 
in complete SAT solvers, therefore we are still having some uncertainty of the 
k*, e.g. this may be A:* — 1. Last but no least, we have seen that a cutoff of 1000 
times the number of variables is very successful and scales with the problem size, 
so it does not make sense to try different cutoffs even in larger circuits. 

We do produce shorter or similar expressions than Espresso [1] or MisII 
[2], however our search procedure is slower. It is worth to mention that those 
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Number of terms in canonical form 

Fig. 2. Local search landscape for a 3bits multiplier. Ek is still unsat until fc < 29 
products, thus the local search arrives to the cutoff. Then, in the satishability turning 
point, fe = 30, the search effort slump sharply. 



softwares also suffer scaling factor, whereas the procedure shown before scales 
polynomially with the number of variables. For circuits with hundreds entries in 
the truth table we may spend some hours in finding the minimal k. The resulting 
encoding (E^q) for the three bits multiplier has 576 Kb, thus it is not a problem 
for any current PC nor workstation. 

5 Examples and Discussion 

Combinational circuit design has been an active area for long time. However, the 
main drawback has been the problem scale. Whilst it is relatively easy to design 
and minimise small circuits, search grow too fast to manage large and complex 
functions. With apologies to many scientists who have been active in this field, 
we proceed to show a comparison, rather than a comprehensive review. 

We selected two representative circuits, adder and multiplier, to compare 
the underlying boolean circuitry. Multipliers have been argued to grow expo- 
nentially with the number of input variables [10]. Please notice that we shall 
put all examples in canonical form, translating XORs, and avoiding MUX gates. 
This representation is suitable for standard Programmable Logic Arrays, where 
/c— layers of circuitry are implemented with AND and OR gates, and not re- 
stricted to gates having only two inputs. We may say our circuits are minimised 
in propagation delay, with a worst case delay of fc-levels of logic. 
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5.1 Multipliers 

To begin with, we show the minimal function for a two bits multiplier. Table 1 
shows its truth table A* B = C, where A is X\X 2 , B is x^x^, and C is yij/ 2 j/ 3 j/ 4 : 



Table 1. Truth table for a two bits multiplier. 



XI 


X2 


X3 


X4 


yi 72 ys 74 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


0 


1 


0 


0 


0 


0 


0 


• 0 


0 


1 


1 


0 


0 


0 


0 


1 


1 


0 


0 


0 


0 


0 


0 


1 


1 


0 


1 


0 


0 


1 


1 


1 


1 


1 


0 


0 


1 


1 


0 


1 


1 


1 


1 


1 


0 


0 


1 



For this small circuit we found that the minimal expression in canonical form has 
seven different products (i.e. k* = 7), which are shown below. Figure 3 depicts 
this circuit in logic array format. 



yi = XiX2Xs,XA 

H2 = X\X2XzXi + XxXzXa 
7/3 = X\X2X3 + X\X2XzXi + X\XzXa + X2XSX4 
7/4 = X2a;4 

This 2bits multiplier is shorter than the expression found by Miller et al. [10]. 
The comparison was done after replacing one XOR gate to its corresponding 
AND, OR, and NOT representation. Figure 4 depicts this circuit in logic array 
format. The following eight different products corresponds to their findings: 



7/1 = X1X2X3X4 

7/2 = X1X2X3 + X1X3X4 
7/3 = X1X2X3 + X1X2X4 + X1X3X4 + X2X3X4 
7/4 = X2X4 

Surprisingly, after changing two XORs from the 2-bits multiplier shown by Coello 
et al. [3,4], we arrived to exactly the same eight products previously mentioned. 

Analogously, a 3-bits multiplier shown by Miller et al. [10] has 36 different 
terms in sum of products, while the minimal is k* = 30. Notice that this com- 
parison is done after replacing their nine XORs by elementary gates, i.e. AND, 
OR, NOT. 
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Fig. 3. Minimal 2-bits multiplier in logic array, k* = 7. 




Fig. 4. Two bits multiplier found by Miller et al. and Coello et al. in logic array, k = 8. 



Both Coello et al. and Miller et al. approaches are based on self-adaptive 
extrinsic evolution [6], which has produced short and interesting patterns in 
digital design. Lamentably, after replacing their XORs those designs have larger 
circuitry in terms of basic components. 
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5.2 Circuit complexity 

Additional information on different multipliers and adders are summarised in 
table 2. 



Table 2. Search space needed to produce some minimal circuits. 



Description 


fe* 


Search space 


Two bits multiplier 


7 




Two bits adder 


11 


2363 


Three bits multiplier 


30 


23870 


Three bits adder 


31 


24340 


Four bits adder 


75 


248975 


Four bits multiplier 


121 


284942 



For every entry we present their description, k* , as well as the search space 
where a first solution was found. The latter number may give us an idea about 
the circuit hardness. For all circuits we can calculate the number of variables 
(from eq. 1) needed to encode a minimal circuit in canonical form. This is an 
unbiased search space where the first, i.e. most difficult, circuit can be found. 
Moreover, it is not related to any data structure, objectives to minimise, gates, 
matrix codification nor length allowed to represent a circuit. Hence we may say 
that synthesising a minimal three bits adder is slightly harder than a three bits 
multiplier, as can be seen from table (2) under “Search space”. Interestingly, 
same analogy does not hold in four bits. 

6 Conclusions and Future Work 

We have shown a fresh view on how to synthesise and minimise combinatorial 
circuits with SAT. This minimisation can be efficiently done by means of a 
bisection algorithm and a randomised local search. We have found a cutoff in 
the local search algorithm to one thousand times the number of variables, which 
works reasonably well, however a complete algorithm, which can assure the exact 
k* is still exponential. Our experiments with complete solvers reveal that the 
exact k* is at most one product away from the solution found by the randomised 
algorithm. 

We are still missing an exhaustive comparison of complete solvers for this 
particular encoding. It had been noticed that some solvers perform quite well 
on certain encodings, whilst very poor in others. Other thing that remains to be 
investigated is the possibility to improve our current SAT solvers by exploiting 
any internal structure or pattern of this fascinating encoding. 
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Abstract. As mobile devices become more powerful, interprocess com- 
munication becomes increasingly more important. Unfortunately, this 
larger freedom of mobility gives rise to unknown environments. In these 
environments, processes that want to communicate with each other will 
be unable to do so because of protocol conflicts. Although conflicting 
protocols can be remedied by using adaptors, the number of possible 
combinations of different protocols increases dramatically. Therefore we 
propose a technique to generate protocol adaptors automatically. This 
is realised by means of genetically engineered classifier systems that use 
Petri nets as a specification for the underlying protocols. This paper 
reports on an experiment that validates this approach. 



1 Introduction 

In the field of evolvable computing, software (and hardware) is developed that 
adapts itself to new runtime environments as necessary. The runtime environ- 
ments targetted in this paper are open distributed systems in which interprocess 
communication forms an essential problem. In these environments an applica- 
tion consists of processes that communicate with other processes to reach specific 
goals. 

With the advent of mobile devices these processes do not necessarily know 
in which kind of runtime environments they will execute. Therefore they rely on 
standardised solutions, such as JINI, to find other processes offering a certain 
behaviour. 

Once the other process is known, the real problems start. How can the re- 
questing process communicate with the unknown offered process? Given the 
fact that those processes are developed by different organisations, the protocols 
provided and required can vary greatly. As a result protocol conflicts arise. 

* Corresponding author. He is developing peer-to-peer embedded systems for a project 
funded by the Flemish Institute for Science and Technology (IWT). 
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On first sight, a solution to this problem would be to offer protocol adaptors 
between every possible pair of processes. The problem with this approach is that 
the number of adaptors grows quadratic to the number of process protocols and 
as such it simply doesn’t scale. The solution is to automate the generation of 
protocol adaptors between communicating processes. 

As a potentially useful technique for this adaptor generation, we explored the 
research domain of adaptive systems. We found that the combination of genetic 
programming, classifier systems, and a formal specification in terms of Petri nets 
allowed us to automate the detection of protocol conflicts, as well as the creation 
of program code for adaptors that solve these conflicts. This paper reports on 
an experiment we performed to validate this claim. 

2 Prerequisites of Interprocess Communication 

Processes communicate with each other only by sending messages over a com- 
munication channel (similarily to CSP |P and the 7r-calculus m- Communica- 
tion channels are accessed by the process’ ports. Processes communicate asyn- 
chronously and always copy their messages completely upon sending. The con- 
nections between processes are full duplex: every process can send and receive 
messages over a port. This brings us in a situation where a process provides a 
certain protocol and requires a protocol from another process. A process can 
have multiple communication channels: for every communication partner and 
for every provided/required protocol. 

We imposed other requirements on the interprocess communication to allow 
us to generate adaptors: 

1. Implicit addressing. No process can use an explicit address of another pro- 
cess. Processes work in a connection-oriented way. The connections are set 
up solely by one process: the connection broker. This connection broker will 
also evolve adaptors and place them upon the connections when necessary. 

2. Disciplined communication. No process can communicate with other pro- 
cesses by other means than its ports. Otherwise, ‘hidden’ communication 
(e.g., over a shared memory) cannot be modified by the adaptor. This also 
means that all messages passed over a connection should be copied. Messages 
cannot be shared by processes (even if they are on the same host), because 
this would result in a massive amount of concurrency problems. 

3. Explicit protocol descriptions. While humans prefer a protocol description 
written in natural language, computers need an explicit formal description 
of the protocol semantics. A simple syntactic description is no longer suitable. 

3 Specifying Protocols 

As a running example we choose a typical problem of communicating processes: 
how processes synchronise with each other. Typically, a server provides a con- 
currency protocol (often a transaction protocol) [E] that can be used by clients. 
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The clients have to adhere to this specification or they won’t function. Since 
the clients also expects a certain concurrency behaviour from the server, it is 
possible that the required interface and provided interface differ. 

For example, a client/server can require/provide a full-fledged optimal trans- 
action protocol or it can require/provide a simple locking protocol. When two 
such protocols of a different kind interact, we can run into an incompatibility 
problem. 

In our example we use a simple locking protocol of the server with which a 
client can typically lock a resource and then use it. The API for the server is 
described as follows. (A similar protocol description can be given for the clients.) 

incoming lock(resource) 

outgoing lock_true (resource) 
outgoing lock_false (resource) 

// lock_true or lock_false are sent back whenever a lock 
// request comes in: lock_true when the resource is locked, 

// lock_false when the resource couldn’t be locked, 
incoming unlock(resource) 

outgoing unlock_done (resource) 

// will unlock the resource . Send unlock_done back when done . 
incoming act (resource) 

outgoing act_done (resource) 

II will do some action on the process. 

The semantics of this protocol can be implemented in different ways. We will 
use two kinds of locking semantics [ 0 : 

Counting semaphores allow a client to lock a resource multiple times. Every time 
the resource is locked the lock counter is increased. If the resource is unlocked 
the lock counter is decreased. The resource is finally unlocked when the counter 
reaches zero. 

Binary semaphores provide a locking semantics that doesn’t offer a counter. It 
simply remembers who has locked a resource and doesn’t allow a second lock. 
When unlocked, the resource becomes available again. 

Differences in how the API considers lock and unlock can give rise to protocol 
conflicts. In figure Qthe client process expects a counting semaphore from the 
server process, but the server process offers a binary semaphore. The client can 
lock a resource twice and expects that the resource can be unlocked twice. In 
practice the server just marked the resource as locked. If the client unlocks the 
resource, the resource will be unlocked. Acting upon the server now is impossible, 
while the client expects it to be possible. 

This protocol conflict arises because the API does not specify enough se- 
mantic information. Hence, we propose to use a more detailed and generally 
applicable formalism, namely Petri nets, to offer an explicit description of the 
protocol semantics. Petri nets [Q offer a model in which multiple processes tra- 
verse states by means of state transitions. 

In the context of our locking example, this allows us to write a suitable 
adaptor by relying on: (1) which state the client process expects the server to 
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Fig. 1. Conflict between a counting semaphore protocol and a binary semaphore pro- 
tocol. 



be in; (2) in which state the server process is. Both kinds of information are 
essential: If we don’t know that the client thinks that it still has a lock, and 
we don’t know that the state of the server is unlocked (see figure P, no learned 
algorithm can make a correct decision. 

As an example Petri net that offers the needed semantics, the left part of 
figure ^specifies a binary semaphore locking strategy. The current state is un- 
locked. From this state the process requiring this protocol, can choose only one 
action: lock. It then goes to the locking state until lock_true or lock_false comes 
back. We can also use this Petri net to model the behaviour of a process that 
provides this protocol. It is perfectly possible to offer an protocol that adheres 
to this specification, in which case, the incoming lock is initiated from the client, 
and lock-true or lock-false is sent back to the client when making the transition. 



4 Evolving Protocols 

Protocol adaptors overcome the semantic differences between two processes. We 
propose to use a genetic algorithms 0 with classifier systems to generate adap- 
tors. Classifier systems are known to work very well in cooperation with genetic 
algorithms, they are Turing complete and they are symbolic. This is important 
because our Petri net description is in essence a symbolic representation of the 
different states in which a process can reside. 

If this representation would be numerical, techniques such as neural networks 
[0, reinforcement learning and Q-learning [0 could probably be used. 
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Fig. 2. Two Petri-net descriptions of process protocols. Ellipses correspond to states. 
Rectangles correspond to transitions. The current state (marked with “!’) is coloured 
yellow. The red transitions (marked with ‘*’) represent incoming messages. The blue 
ones (marked with represent outgoing messages. 



The standard questions before implementing any genetic programming tech- 
nique are: What are the individuals and their genes? How do we represent the 
individuals? How do we define and measure the fitness of an individual? How do 
we initially create individuals? How do we mutate them and how do we create 
a cross-over of two individuals? How do we compute a new generation from an 
existing one? For a quick overview of the parameters of our genetic program we 
refer to table 0 

In our implementation, the individuals will be protocol adaptors between 
communicating processes. The question of how to represent these individuals is 
more difficult. We could use well-known programming languages to represent 
the behaviour of the adaptor. Unfortunately, the inevitable syntactic structure 
imposed by these languages complicates the random generation of programs. 
Moreover, these programming languages do not offer a uniform way to access 
memory. 

An alternative that is more suitable for our purposes is the Turing complete 
formalism of classifier systems |^. A classifier system is a kind of control system 
that has an input interface, a finite message list, a classifier list and an output 
interface. The input and output interface put and get messages to and from 
the classifier list. The classifier list takes a message list as input and produces 
a new message list as output. Every message in the message list is a fixed- 
length binary string that is matched against a set of classifier rules. A classifier 
rule contains a number of (possibly negated) conditions and an action. These 
conditions and actions form the genes of each individual in our genetic algorithm. 
Conditions and actions are both ternary strings (of 0, 1 and #). is a pass- 
through character that, in a condition, means ‘either 0 or 1 matches’. If found 
in an action, we simply replace it with the character from the original message. 
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Table 1. Illustration of how actions produce a result when the conditions match all 
messages in the input message list. ' is negation of the next condition. A disjunction 
of two conditions is used for each classifier rule. The second rule does not match for 
input message 001. The third rule does not match because the negated condition is not 
satished for input message 001. 



Input message list = { 001, 101, 110, 100 } 



Condition 


Action 


Matches 


Result 


00# 


101 


111 


yes 


111 


01# 


1## 


000 


no 


/ 


1## 


'00# 


### 


no 


/ 


1## 


### 


1#0 


yes 


100, no 



Output message list = { 111, 100, 110 } 



Table ^shows a very simple example. When evaluating a classifier system, all 
rules are checked (possibly in parallel) with all available input messages. The 
result of every classifier evaluation is added to the end result. This result is the 
output message list. For more details, we refer to [0. 

A classifier system needs to reason about the actions to be performed based 
on the available input. In our implementation, the rules of a classifier system 
consist of a ternary string representation of the client state and server state (as 
specified by the Petri net), as well as a ternary string representing the requested 
Petri net transition from either the client process or the server processCJ With 
these semantics for the classifier rules, translating a request from the client to 
the server requires only one rule. Another rule is needed to translate requests 
from the server to the client (see table □El 

The number of bits needed to represent the transitions depends on the num- 
ber of possible state transitions in the two Petri nets of figure 0 The number 
of bits needed to represent the states of each Petri net depends on the number 
of states in the Petri net as well as the variables that are stored in the Petri 
net (e.g., the lockcount variable requires 2 bits if we want to store 4 levels of 
locking) . 

Although this is a simple example, more difficult actions can be represented. 
Consider the situation where the client uses a counting-semaphores locking strat- 
egy and the server uses a binary-semaphores locking strategy. In such a situation 
we don’t want to send out the lock-request to the server if the lock count is larger 
than zero. Table 0 shows how we can represent such a behaviour. 



^ Currently we are investigating whether we can drop this translation and generate 
an adaptor from the Petri net representation directly. 

^ This method of using full classifier systems as individuals is known as the Pittsburgh 
approach 0 . The Michigan approach, whereby a set of classifier rules evolve together 
to reach a solution^, is not suitable for our purposes because one rule doesn’t cover 
the behaviour of an adaptor. As such, cross-over between single rules would not help 
that much. 
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Table 2. Blind translation between client and server processes. The last 5 characters 
in column 1 represent the corresponding transition in the Petri net. The characters 
in the second and third column represent the states of the client and server Petri 
net, respectively. The fourth column specifies the action to be performed based on the 
information in the first four columns. 



classifier condition | 


action 


rule description 


requested 

transition 


client 

state 


server 

state 


performed 

action 




00#### 


#### 


### 


11#. 


■ # 


Every incoming action from the client (00) 
is translated into an outgoing action on the server (11) 


01#### 


#### 


### 


10#. 


■ # 


Every incoming action from the server (01) 
is translated into an outgoing action to the client (10) 



Table 3. Translating a client process lock request to a server process lock action when 
necessary. 



classifier condition 


action 


rule description 


requested 

transition 


client 

state 


server 

state 


performed 

action 




00 001 


'##00 


### 


10 010 .. . 


If the client wants to lock (001) and already has a lock 
we send back a lock_true (010) 


00 001 


##00 


### 


11 001 .. . 


If the client wants to lock (001) and has no lock (^:ji^00) 
we immediately send the message through (001). 



The genetic programming we implemented uses a full classifier list with vari- 
able length. The classifier list is an encoding of the Petri nets, as representation 
for the individuals. Every individual is initially empty. Every time an individual 
encounters a situation where there is no matching gene a new gene (i.e., a new 
classifier rule) will be added with a condition that covers this situation and a 
random action that is performed on the server and/or the client. This way of 
working, together with the use of Petri nets guarantees that the genetic algo- 
rithm will only search within the subspace of possible solutions. Without these 
boundaries the genetic algorithm would take much longer to find a solution. 

Fitness of an individual is measured by means of a number of test scenarios. 
Every test scenario illustrates a typical behaviour the client requests from the 
server. The fitness of an individual is determined by how many actions the 
scenario can execute without yielding unexpected behaviour. Of course this is 
not enough; we should not have solutions that completely shortcut the server. For 
example, the algorithm could return lock_true every time a request comes in from 
the client, without even contacting the server. To avoid this kind of behaviour 
our algorithm provides a covert channel that is used by the test scenario to 
contact the server to verify its actions. 

The genetic programming uses a steady-state GA, with a ranking selection 
criterion: to compute a new generation of individuals, we keep {reproduce) 10% of 
the individuals with the best fitness. We throw away 10% of the worst individuals 
(not fit enough) and add cross-overs from the 10% best groupQ To create a cross- 
over of individuals we iterate over both classifier lists and each time randomly 



® These values were taken from [03 and gave good results during our experiments. 
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Table 4. Parameters and characteristics of the genetic program 



parameter 


value 


individuals (genotype) 


variable-length classifier system represented as bitstring 


population size 


100 


maximum generations (100 runs) 


11 


parent selection 


ranking selection (10 % best) 


mutation 


bitflip on non ranked individuals 


mutation rate 


0.8 


crossover 


uniform 


crossover rate 


0.1 


input/output interfacing 


Petri net state/transition representation 


actions 


message sending 


fitness 


number of successfully executed actions 



select a rule that will be stored in the resulting individual. It should be noted that 
the individuals that take part in cross-over are never mutated. The remaining 
80% of individuals are mutated, which means that the genes of each individual 
are changed at random: for every rule, a new arbitrary action to be performed 
on server or client is chosen. On top of this, in 50% of the classifier rules, one bit 
of the client and server state representations is generalised by replacing it with 
a This allows the genetic program to find solutions for problems that are not 
presented yet. 

5 The Experiment 

We will now present the experiment that shows the feasibility of the above tech- 
niques to automatically learn an adaptor between incompatible locking strate- 
gies. 

The experiment is set up as a connection broker between two processes. 
The first process contacts the second by means of the broker. Before the broker 
sets up the connection it will generate an adaptor between the two parties to 
mediate semantic differences. It does so by requesting a running test process from 
both parties. The client will produce a test client and test scenarios. The server 
will produce a test server. In comparison with the original process, these testing 
processes have an extra testing port, over which we can reset them. Furthermore, 
this testing port is also used as the covert channel for validating actions at the 
server. 

The genetic program, using a set of 100 individuals (i.e., adaptor processes), 
will deploy the test processes to measure the fitness of a particular classifier 
system. Only when a perfect solution is reached, i.e., a correct adaptor has been 
found, the connection is set up. For reliability reasons we have repeated the 
experiment (i.e., the execution of the genetic program) 100 times. 

The scenarios offered by the client are the ones that determine what kind 
of classifier system is generated. We have tried this with three scenarios, as 
illustrated on the left of figure 0 Scenario 1 is a sequence: [lockO , act() , 
unlock ()]. Scenario 2 is the case we explained in figure 0 Scenario 3 is similar 
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Client 
Scenario 1 



lock true 



act done 



unlock done 



Client 
Scenario 2 



Client 
Scenario 3 



Adaptor 



1 ^ 1 ^ 1 ^ 




Z lock I 


I lock I 


I act 


lock true 


lock true I 


act 


Z lock I 


I act I 


act done 1 


lock true 


act done genetic ^ I 

algorithm : 


unlock 


Z unlock I 


I act I 


unlock done 1 


unlock done 


act done 


act done 


Z act I 


I unlock I 


I unlock 


1 act done 


unlock done 


unlock done 



Fig. 3. The three test scenarios we used as initial input to the genetic programming. 
The dashed vertical lines are waits for a specific message (e.g., lock_true). When using 
only scenarios 1 and 2 as input, one of the generated adaptors behaved as specified on 
the right hand side. 



to scenario 1: [lockO , actO , actO , unlock ()]. The reason why we added 
such a look-alike scenario will become clear in our observations. In all three 
scenarios, we issue the same list of messages three times to ensure that the 
resource is unlocked after the last unlock operation. 



5.1 Observations 

An examination of the results of several runs of our genetic programming algo- 
rithm lead to the following observations: 

When we used the covert channel to measure the fitness, we found that (for all 
100 runs of the GA) a perfect solution was found within at most 11 generations. 
When we didn’t use the covert channel to check the actions at the server side, the 
genetic algorithm often (30%) created a classifier that doesn’t even communicate 
with the server. In such a situation the classifier immediately responds lock_true 
whenever the client requests a lock. 

Occasionally a classifier system was generated with a strange behaviour. Its 
fitness was 100%, but it worked asynchronously. In other words, the adaptor 
would contact the server process before the client process even requested an 
action from the server process. It could do so because the adaptor knew that the 
client would request that particular action in the given context. This is illustrated 
on the right of Figure Q It implies that a learned algorithm can anticipate certain 
kinds of future behaviour. 

Initially, we only used scenario 1 and 2 to measure the fitness of each in- 
dividual. We encountered the problem that sometimes, the adaptor anticipates 
too much and after the first act, keeps on acting. This problem was solved by 
assigning a zero fitness to such solutions. 
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Table 5. The generated classifier system for a single run. 



requested 

transition 


client 

state 


server 

state 


performed 

action 


description 


00 00 01 


10 00 #1 


11 #### 


00 10 01 


client?lock() & client=locked client. lock_true() 


00 00 01 


10 00 1# 


11 #### 


00 10 01 


client?lock() client=locked client. lock_true() 


00 00 01 


10 00 00 


" 11 010 


00 11 01 


client?lock() Sz client 7^ locked server. lock() 


00 00 10 


10 00 1# 


11 #### 


00 10 11 


client?unlock() Sz clientlock>2 client. unlock_done() 


00 00 10 


10 00 01 


11 #### 


00 11 10 


client?unlock() Sz clientlock=l server. unlock() 


00 00 11 


10 ##### 


11 #### 


00 11 11 


client?act() — >• server. act() 


00 01 10 


10 ##### 


11 #### 


00 10 10 


server?act_done() client. act_done() 


00 01 00 


10 ##### 


11 #### 


00 10 00 


server?lock_false() client. lock_false() 


00 01 01 


10 ##### 


11 #### 


00 10 01 


server?lock_true() client. lock_true() 


00 01 11 


10 ##### 


11 #### 


00 10 11 


server?unlock_done() client. unlock_done() 


00 00 01 


10 00 00 


11 010 


00 10 00 


client?lock() Sz server=locked & client 7^ locked 
client. lock_false() 



As a last experiment we measured the fitness by combining information from 
all three scenarios. This allowed the genetic algorithm to find a perfect solution 
with less generations because separate individuals that developed behaviour for 
a specific test scenario were combined in a later generation using cross-over. This 
illustrates the necessity for a cross-over operator. A random search would take 
considerably more time to find a combination of both behaviours. 

One of the classifier system that is generated by the genetic algorithm, when 
providing as input all three test scenarios, is given in table Q The produced 
classifier system simply translates calls from client to server and vice versa, unless 
it is about a lock call that should not be made since the server is already locked. 
The bit patterns in the example differ slightly from the bit patterns explained 
earlier. This is because we need the ability to make a distinction between a 
’transition- message’ and a ’state-message’. All transition messages start with 00 
and all state-messages start with 10 for client-states and 11 for server-states. 

5.2 Discussion 

In our approach, the problem of ’writing correct adaptors’ is shifted to the prob- 
lem of ’specifying correct test sets’: whenever the developer of a process encoun- 
ters an incompatibility, he needs to specify a new test scenario that avoids this 
behaviour. This test scenario is given as additional input to the genetic algo- 
rithm, so that the algorithm can find a solution that avoids this incompatibility. 
The result of this approach is that the programmer does not have to implement 
the adaptors directly, but instead has the responsibility of writing good test sets 
(i.e., a consistent set of scenarios that is as complete as possible). This is a non- 
trivial problem, since the test set needs to cover all explicit as well as implicit 
requirements. The main advantage of test sets over explicit adaptors is that we 
would need a new adaptor for every pair of communicating processes, while we 
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only need one test set for each individual process. As such, test sets are more 
robust to changes in the environment: when a process needs to communicate 
with a new process, there is a good chance that the test set will already take 
into account potential protocol conflicts. Another important advantage of test 
sets is that they can help in automatic program verification. Bugs in the formal 
specification (the Petri net) can be detected and verified at runtime. As such, 
this approach helps the developer to stay conform to the program specification. 
This clearly helps him in his goal to write better software. 

Below we discuss some strengths and weaknesses of our approach: 
Automatically generated adaptors can be better than hand-crafted adapters 
since they can reorder incoming and outgoing messages as necessary. This can 
result in anticipated behaviour that boosts performance. 

The genetic algorithm we proposed has the problem that it needs to learn a 
certain behaviour based on a very small set of examples. Therefore, the learning 
algorithm will automatically generate more general or more specific adaptors 
when offered test sets. This tendency to generalize matches does not always 
correspond to how a programmer tries to generalise. If there is a close corre- 
spondence, the programmer simply needs to write test sets that will naturally 
be generalised to the desired adaptor. On the other hand, if the generalisation 
(or specialisation) does not fit the developer’s way of thinking, the algorithm 
will generate seemingly illogical adaptors. 

An ideal test scenario should cover all the actions that will be invoked upon the 
server in all possible combinations. How can we write good tests that do not 
leave any open holes for the programmer? And if we can write such tests, are 
the Petri nets still necessary? In other words, is it possible to learn the adaptor 
automatically just by looking at the interaction between the processes? 

Some protocol conflicts that seem simple at first sight cannot be solved (not 
even by humans). For example, the locking protocol example could be used to 
lock simple (a:, y) - specified cells on a checker board. It is impossible to protocol 
this with a locking strategy that locks and unlocks the whole board at once. A 
simple solution such as ‘lock all fields’ will not work because other communication 
partners can enter the held and lock a single position. This indicates that the 
approach presented here is good in solving ‘control flow’ problems but is bad at 
converting ‘data representations’. However, this is a general AI problem [ B 1 J for 
which no solution yet has been found. 

In this paper we presented only a simple example with small protocols (binary 
versus counting semaphore locking strategies) for the sake of the presentation. 
We are currently investigating how we can write adaptors by generating a suit- 
able petri-net instead of a classifier system. More elaborate experiments and 
technical details can be found on http://borg.rave.org/adaptors/. 



6 Conclusion 

We proposed an automated approach to create intelligent protocol adaptors to 
resolve incompatibilities between communicating processes. Such an approach is 



Using Genetic Programming to Generate Protocol Adaptors 433 



indispensable to cope with the combinatorial explosion of protocol adaptors that 
are needed in an open distributed setting where processes interact with other 
processes in unpredictable ways. 

Our approach uses a genetic programming tecnique that evolves classifier 
systems. These classifier systems contain classifiers that react upon the context 
they receive from both client process and server process. The context is defined 
as a combination of the possible client-side and server-side states as given by 
a user-specified Petri net. To measure the fitness of an adaptor, the user needs 
to provide test scenarios as input. This enables the user to avoid undesired 
behaviour in interprocess communication. 
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Abstract. This paper presents a methodology to design optimized electronic 
digital systems from high abstraction level descriptions. The methodology uses 
Genetic Programming in addition to high-level synthesis tools to automatically 
improve design structural quality (area measure). A two-stage, multiobjective 
optimization algorithm is used to search for circuits with the desired functional- 
ity subjected additionally to chip area constraints. Experiment with a square- 
root approximation datapath design targeted to FPGA exemplifies the proposed 
methodology. 



1 Introduction 

Recently, integrated circuit (IC) design has focused on meeting time-to-market and 
costumers specific requirements for more and more complex systems [1]. To address 
this new trend, automation of the entire design process from conceptualization to 
silicon or a descrihe-and-synthesize methodology [2] has become necessary. Using a 
top-down design approach, the system functionality is specified through a behavioral 
model, simulated to verify correct operation and synthesized using high-level synthe- 
sis (HLS) tools. 

A chip design process using HLS involves several steps, the first being the specifi- 
cation of the system functionality. To this end a variety of conceptual models can be 
used as discussed in detail in [3]. Finite-state machines (FSM), finite-state machines 
with datapath (FSMD) and Petri nets are some examples. Programming languages are 
conceptual models of particular interest since they can support heterogeneous model- 
ing, i.e., they can describe data, activity and control simultaneously. Hardware de- 
scription languages (HDLs) such as VHDL [4], [5], are able to capture the system 
functionality in a machine-readable and simulatable form. The system functional 
specification step is followed by the HLS one, where the behavioral model is trans- 
lated into a register-transfer level (RTL) structural model. Logic and layout synthesis 
steps follow in the sequence. 

Although more and more EDA tools are available providing high abstraction level 
design facilities, the market still resists to the automatic behavioral synthesis ap- 
proach. Gajski et al. [6] point out two reasons for this: poor-quality results and lack of 
interactivity during the synthesis process. High-level behavioral specifications are 
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usually considered not quite “synthesizable” since at this abstraction level many low- 
level implementation details are masked (by definition) from the designer [7]. To 
cope with this problem most behavioral synthesis tools constrain the design space by 
assuming a “target architecture” to which all high level specifications are mapped. 
The result is a reasonable average solution locally optimized for each design problem. 

On the other hand, layout synthesis and hardware implementations are considera- 
bly simplified by FPGA (Field Programmable Gate Arrays) prototyping. By eliminat- 
ing the need to cycle through an IC production facility, both time-to-market and fi- 
nancial risk can be substantially reduced. Recent statistics [8] indicate that, today, 
approximately one half of all chip designs are started using FPGAs. This trend is 
followed in modern synthesis tools that offer a smooth migration path between FPGA 
and ASIC technologies, providing direct mapping of an FPGA design into ASIC 
libraries [9]. 

This work presents a digital IC design methodology based on Evolvable Hardware 
(EHW) techniques [10], [11]. Genetic Programming is used to automatically explore 
the design space and improve the design quality. To accomplish this, a multi-criteria 
fitness function was defined to rank candidate solutions according to their area sizes. 
An experiment with a square-root approximation datapath design targeted to EPGA is 
detailed. Simulation results confirm the automatic generation of optimized structures 
starting from a high level specification. 

The paper is organized as follows: Section 2 presents related works in order to es- 
tablish the context in which the proposed methodology is applied. In Section 3 the 
Evolutionary Computation and Genetic Programming concepts of interest to the pro- 
posed methodology, are discussed. The design methodology is detailed in Section 4 
and the example of a datapath design is presented in Section 5. Conclusions are drawn 
in Section 6. 



2 Related Works 

Evolutionary Algorithms have already been used in lower levels of electronic design, 
including routing, partitioning and placement [1], [12], and more recently in the 
automation of the structural design of digital circuits, as well as in the search for qual- 
ity implementations, as related below. 

Recent works [13] - [16] present evolved structures entirely in software using 
computer simulations (extrinsic evolution) to evaluate intermediate solutions. In [13] 
an algorithm capable of evolving 100% functional arithmetic circuits is presented. 
Based on a rectangular array of uncommitted logic cells of FPGAs, the algorithm is 
able to re-discover conventional optimum designs for one and two-bit adders, eventu- 
ally improving the gate count of human produced designs of the two-bit multiplier. 

Kalganova & Miller [14] addressed a two-stage multiobjective fitness function. In 
the first stage a 100% functional solution, which occurs when the truth table is 
matched, is sought. In the second stage the number of gates actually used in each 
candidate solution is taken into account in the fitness function. This allows circuits to 
evolve with the desired functionality minimizing their number of active gates. The 
authors limited their focus to combinational logic circuits, containing no memory 
elements and no feedback paths. A similar multiobjective optimization technique was 
proposed by Coello et al. [15], the main difference being the use of a population- 
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based technique to split the search task among several sub-populations. In this ap- 
proach each objective is assigned to a small sub-population, which is merged with the 
rest of the individuals when one of the objectives is satisfied. The merged populations 
contribute to minimize the total amount of mismatches produced between the encoded 
circuits and the truth table. 

Hounsell & Arslan [16] addressed an EHW environment called Virtual Chip in 
which timing constraints are taken into account. In this environment not only gates 
but also functional macros, such as half-adders and compound gates, are available to 
the evolutionary process. In order to speed up the overall process a single VHDL 
program describing all candidate circuits is used in the simulations. In this work a 
phased approach was introduced to ease the evolution of complex systems. The 
phased evolution consists of evolving a circuit in stages: the system initially evolves a 
“sub-circuit” for each output of the desired circuit; in the sequence, redundant logic 
between the evolved sub-circuits is removed as they are combined to generate the 
required circuit. Using this approach a 3-bit multiplier was evolved with results as 
good as CAD based circuits. 

An alternative to the above approaches is the use of formal grammars to represent 
the evolution of hardware designs, as in the work of Hemmi et al. [17]. In this ap- 
proach, hardware specifications, which produce not only hardware structures but also 
behaviors, are automatically generated as HDL programs. To create HDL-descriptions 
the authors used Genetic Programming (GP) to evolve trees from productions (re- 
phrased rules in the HDL grammar). The fitness is evaluated using simulation tools 
and hardware implementation is created after the learning task is completed. This 
system was used to automatically generate a sequential binary adder. 

The approach to be presented in this paper uses the main techniques reported 
above, together with new improvements. Similar to [17], formal grammar of an HDL 
language was used to build candidate solutions. However, the place-and-route task 
was placed inside the evolutionary process in order to supply the fitness function with 
chip area information. The multiobjective fitness function adopted closely parallels 
the two-stage approach suggested in [14], although applied in a different framework. 
Also, an experience was made using a single VHDL program, as suggested in [16], in 
order to reduce computing effort. 



3 Evolutionary Computation and Genetic Programming 

Lvolutionary Computation (LC) consists of a class of probabilistic algorithms in- 
spired by the principles of Darwinian natural selection and variations that can greatly 
benefit from the increased simulation power for complex design, control and knowl- 
edge discovery applications [18]. Genetic Algorithm (GA) and Genetic Programming 
(GP) are the instances of LC most widely used in Lvolvable Hardware [19]. 

GA [20] is based on the evolutionary process occurring in nature where a popula- 
tion is shaped by the survival of best-fit individuals. GA usually works with strings of 
fixed length in which one or more parameters are coded. GP is a branch of GA, the 
main difference being the solution representation. Unlike GA, GP can easily code 
genotypes of different sizes, which increases the capacity in structure creation. 

Koza introduced canonical GP [21] with tree-based genome using LISP language. 
Since that, other genome types have been proposed like linear and graph genomes 
[22]. Other GP systems were developed to automatically create programs in arbitrary 
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languages (other than LISP) [23], [24], An alternative way to encode domain- 
knowledge is by formal grammars [25]. Programs may be produced by combining a 
context-free grammar (CFG) with GP [26] - [29], known as G’P (Grammar-Guided 
Genetic Programming). G’P provides a framework for automatic creation of error free 
programs in arbitrary language. Genetic Programming Kernel (GPK) [28] was used as 
the core engine for the GP system used in this paper. GPK is a complex system that 
evolves programs in any language once a Backus-Naur form (BNF) is provided as 
input. However, the initialization procedure was changed to overcome GPK difficulty 
associated with creating the first generation, as pointed out in [29]. 



4 Design Methodology 

The proposed design methodology evolves independent VHDL programs trying to 
breed a solution (hardware description) that implement the desired functionality and 
satisfies design constraints for area. This methodology uses a mixture of C code and 
VHDL, with HLS tools. The execution flow of the proposed methodology is drawn in 
Fig. 1. This figure shows the two main components of the methodology: the GP core 
and the Valuation function. Also, it can be seen that, differently to other methodolo- 
gies [16], [17], synthesis and place-and-route are performed for each candidate solu- 
tion, which gives an exact value for the area. 

The whole methodology is based on the use of the GP algorithm. Fig. 2 shows the 
flowchart of the GP system used, adapted from [21]. In this figure, the index “i” refers 
to an individual in the population of size M. The variable “Gen” is the number of the 
current generation. A population of M individuals (programs) is randomly created 
with the restriction of a pre-defined BNF containing a subset of VHDL grammar 
obeying the sufficiency property [21]. The system runs until the termination criterion 
is satisfied. Since the proposed methodology is concerned with unknown solutions, 
the termination is achieved when the number of generations exceeds a maximum pre- 
defined number G {generational predicate). 

To create VHDL source codes, the entity declaration and the architecture declara- 
tive part of the VHDL program are maintained fixed, as they are common to all indi- 
viduals. The architecture statement part of the VHDL code will evolve. 



Valuation function 




Fig. 1. Execution flow of the methodology 
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Fig. 2. Flowchart of the GP system 



The evaluation of the objective function is a key procedure of the GP system. The 
steps involved in this operation are depicted in Fig. 3. After a new population is cre- 
ated it is necessary to complete the VHDL source code by inserting the entity declara- 
tion and the architecture declarative part of each individual. Then, the individuals are 
synthesized using Altera MAXh-PLUS II compilation tool [30]. To effectively evalu- 
ate the fitness of an individual it is necessary to simulate each of the fitness cases and 
check the results against target values. 

The objective function to be minimized is a multi-parameter error function com- 
bining functionality with the area parameter in a two-stage approach similar to that 
proposed in [14]. In this process the functional specification (F, fitness) must be first 
matched to trigger the search for an optimal area implementation (F^ fitness). Let the 
average functionality error, /e^^, be defined as: 

-'V 1=1 

Where, D. is the desired value for fitness case i, is the simulated value obtained for 
fitness case i, w. is a weighting factor and N is the number of fitness cases. 

The fitness function F is defined as follows: 



< 



D.-S, 

A 



> te 



otherwise 



( 2 ) 












Using Genetic Programming and High Level Synthesis to Design Optimized Datapath 439 




resource 
usage 
in forma tion 

functional 
behavior 
in forma tion 



Fig. 3. Valuation block of the Fig. 2 



Where, te is the target functional specification error for the fitness cases, lu is the 
number of used logic elements (Cf. Subsection 5.3) and A: is a function of lu. The k(lu) 
function is a weighting factor that considers the effect of the area used to implement 
the desired functionality. It is an exponential function of lu that privileges the indi- 
viduals containing the lowest quantities of logic elements. The k(lu) function will take 
the form: 



k(lu) = (3) 

Where, a and b are constants that must be defined by solving the following system: 



7 

^ . = ae 

min 

= ae 



(4) 



Where, lu^.^ and lu^^ are minimum and maximum quantities of logic elements and 
and are the minimum and maximum values of k(lu), respectively. 



5 Experiment 

To exemplify the proposed methodology, it was selected a datapath design taken from 
[6]. The aim is to compute the square-root approximation (SRA) of two integers, a 
and b. In [6] the following approximation formula is given: 

y]a^+ b^ = max ((0. 875x-t0.5y),x) (5) 

Where x = max (lal, \b\), and y = min (lal, \b\). 

In this problem of symbolic regression the goal is to find a function in symbolic 
form that is a good, or an exact, fit to a group of 20 sets of numerical data points. The 
black box of the SRA is given in Fig. 4. In this figure, the “start” input and “done” 
output signals are present for interface control issues. 
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Fig. 4. Black box of the SRA 



5.1 BNF Definition 

A BNF grammar describes admissible structures of a language through a 4-tuple 
{SJ^,T,P] where S denotes the start symbol, N the set of non-terminal symbols, T the 
set of terminal symbols and P the productions, i.e., rewritten rules that map the ele- 
ments of N to T. The BNF that defines the syntax of the VHDL statement part of the 
SRA description is defined as: 

S : : = < f e > ; 

<fe> ::= "PROCESS BEGIN WAIT UNTIL (clk'event and clk= ' 1 ' ) ; 

IF in_a>=in_b THEN x<=in_a; y<=in_b; ELSE y<=in_a; 
x< = in_b; END IF; t6<=" <expr> IF t6>=x THEN 

t7< = t6; ELSE t7<=x; END IF; out_z< = t7; done<= ' 1 ' ; 

END PROCESS;"; 

<expr>::= "(" <expr> ")" <op> "(" <expr> ")" | <var> | 

<var> "(7 downto " <shf_size> ")"; 

<op> I " - 

<var> : : = " X " | " y " | " w " ; 

<shf_size> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7"; 

In the above BNF description the signals “x”, “y”, “w”, “t6” and “t7” are internal 
signals of the VHDL architecture body. The signal “w”, assigned zero, is present to 
make ease the insertion or removal of a terminal, max and min operations are already 
fixed in BNF through IF clause. The original problem was restricted to use input and 
output operands as unsigned type instead of integer type, reducing computing effort. 



5.2 GP Parameters 

The main control parameters of the GP for the first stage {F, fitness) of the proposed 
experiment are shown in Table 1. In the second stage of the evolution {F^ fitness) the 
GP parameters are the same, except for the objective and raw fitness rows of Table 1 
that change to incorporate the search for a design with optimal area. 



5.3 Results 

The EPF8282A FPGA from Altera FLEX 8000 family, with up to 282 logic elements 
(LEs), was used as target device. Each LE consists of a look-up table (LUT), a func- 
tion generator that computes any function of four variables, and a programmable 
register. In this experiment, area will be measured in number of LEs. The target func- 
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tional specification error (te) was set to 2.5%. It establishes for each fitness case the 
maximum relative error between the simulated value of the dependent variable, pro- 
duced by the synthesized VHDL programs, and the desired value. 



Table 1. GP tableau for the first stage of the SRA design 



Objective: 


Find a VHDL program that implements a two argument func- 
tion that fits a sample of 20 I/O sets (a., b., F(a,b.)}, where the 
target function is F(a,b) = sqrtia + b^). 


Terminal symbols: 


X, y, w, -I-, -, 7 downto (logical shift-right operator), 1, 2, 3, 4, 
5, 6, 7 (index of shift-right operator) and the sentences between 
quotation marks on the right-hand side of <fe>. 


Non-terminal sym- 

bols: 


<fe>, <expr>, <op>, <var>, <shf_size>. 


Fitness cases: 


Sample of 20 sets of data points (a, b., F(a., b.)|, a and b. ran- 
domly chosen from the interval [0, 255] and satisfying the 
restriction sqrt(a^ ■¥ b3) < 255. 


Raw fitness: 


The average, taken over 20 fitness cases, of the absolute value 
of the difference between the value of the dependent variable 
produced by simulating synthesized VHDL program and the 
target value of the dependent variable (Cf. eq.(l)). 


Selection Method: 


Fitness-proportionate. 


Main GP Parameters: 


M=100, G=51, Pc=0.73, p„,= 0.12, without elitism. 



A population size of 100 individuals was used, evolving up to a maximum of 51 
generations. Experiments have been carried out with N=20, w,=l and k(lu) = 
0.25e‘°°'^'’*’“’, which guarantees k(lu) e [0.5, 1.0] for lue [50,100] (Cf. eq.(3) and 
eq.(4)). The initial population was created using the ramped half-and-half generative 
method [21], replacing GPK initialization procedure. 

The first functional individual, i.e., one that met all fitness cases evaluations with 
the functional specification target error of 2.5% was created at generation 18. The 
system started, then, the search for an implementation with optimal area. During the 
second stage of the evolution a variety of individuals agreeing with the specified te 
value emerged, although it happened with a non-optimal area use. Functional indi- 
viduals with 95, 92 and 90 LEs have been reported. After 51 generations the best 
individual was on hand. It was selected at generation 29. Its main fitness (F) was 
0.827 and the average functionality error (/e^,.), 1.103. This individual was synthesized 
with 78 LEs. The statement part of this VHDL program was: 

BEGIN 

w <="00000000"; 

PROCESS BEGIN WAIT UNTIL (elk 'event and clk= ' 1 ' ) ; 

IF in_a >= in_b THEN x <= in_a; y <= in_b; 

ELSE y <= in_a; x <= in_b; END IF; 

t6 <=((y(7 downto 1) ) + ( (x) - (x (7 downto 3))))+(y(7 downto 6)); 

IF t6 >= x THEN t7 <= t6; 

ELSE t7 <= x; END IF; 

out_z <= t7; done <= 

END PROCESS; 
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Fig. 5 shows (a) the evolution of the fitness of the best individual and (b) its area 
versus generation curve. In Fig. 5 (b) it can be noted that at generation 13 the best 
individual increases its area in order to achieve better accuracy. At generation 18 it 
reaches the desired accuracy and begins to look for optimal area, which is found at 
generation 29. 



fitness 
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generation 
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fitness/LE 




generation 



b) 

Fig. 5. (a) Histogram of the fitness and (b) of the area of the best individual 



Table 2 reports optimal area implementation (LE column) obtained for other values 
of the functional specification target error (te) with the respective hardware descrip- 
tions. An interesting result shown in Table 2 refers to the individual with optimal area 
for te=25% that do not use the internal signal y (minimum between the two inputs). 
This indicates that the system actually looks for optimal area, taking into account 
route and resource issues of the synthesis tool. 

In some independent runs it could be noted that the first functional individual was 
also the one with optimal area. Nevertheless, this is not always true, as was shown in 
the SRA design for te=2.5%. Also, an experience was made adding the multiplier and 
shift-left operators on the right-hand side of <op> and <expr> non-terminal symbols, 
respectively (Cf. Subsection 5.1). The system showed to be robust by eliminating 
individuals with these two operators from the early generations. 
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Table 2. LE quantities for other te values with respective hardware descriptions 



te 


LE 


Hardware description 


25% 


51 


X + x(7 downto 3) 


15% 


58 


X + y(7 downto 2) 


12% 


59 


X + y(7 downto 1) 


5% 


70 


X - x(7 downto 3) + y(7 downto 1) 



The drawback of the proposed methodology is its excessive time consumption. 
Currently available tools for compiling VHDL programs are slow, particularly the 
place-and-route ones [31]. To overcome this problem a pre-stage, to be run before the 
start of the main optimization process shown in Fig. 1, was introduced. In the pre- 
stage the system evolves a single VHDL program as in [16], each individual is de- 
scribed in a single VHDL PROCESS to bring the whole population close to the target 
functionality. No place-and-route is accomplished at this time. The pre-stage finishes 
when the first functional individual comes out. 

With this revised approach the system runs three times faster while remaining at 
the pre-stage. It was observed that the performance of the overall design process im- 
proves considerably when compared with the original approach (Fig. 1). 



6 Conclusion 

A methodology based on Genetic Programming paradigm to evolve optimized im- 
plementations was presented. It implies the use of a multiobjective evolutionary tech- 
nique. The methodology is fully automatic in the sense that it is not necessary to write 
code but define a BNF grammar and specify the design constraints. Cycling through a 
place-and-route process makes it possible to consider the exact area used by each 
implementation in the optimization. 

The proposed methodology is capable of synthesizing combinational circuits, using 
basic digital or mathematical operators declared in a BNF, which approximates a 
given function with a specific accuracy while looking for optimal area. From this 
work it can be concluded: 

i) It is not necessary to exactly specify the basic operators in the BNF. 

ii) Performance optimization may be achieved instead of (or in addition to) area if 
one considers timing parameters in the objective function. 

iii) The methodology is synthesis-tool independent. 

Finally, the methodology can be used to implement a new function in a semi-filled 
FPGA, i.e., in addition to an already described function (in a VHDL PROCESS of the 
VHDL program). In this case the new function is described in a new VHDL 
PROCESS. The code of the already described function is maintained fixed while the 
new code will evolve. The system will look for synthesizing the new function using 
any common construct between both functions in order to reduce area usage. 
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Abstract. To improve the efficiency of DNA computing, we need to 
prevent hybridization errors. In this paper, we focus on the bulge loop 
structures, which cannot be prevented by current sequence design meth- 
ods. To estimate the formation of bulge loop structure, we measured the 
melting temperature (Tm) and the binding intensity of sequences with a 
single bulge loop in some simple experiments. Based on the experimental 
results, we mainly discuss the effect of the base type and the position of 
the loops. We also discuss the possibilities for modeling the formation of 
bulge loops based on these experimental results. 

Keywords: DNA computing. 



1 Introduction 

Hybridization processes are the most important processes in DNA computing 
and the efficiency needs to be improved for use in all applications. To control 
the chemical reactions of DNA, we must design the sequence set adequately. 

Recently, two design strategies have been proposed for preventing a hy- 
bridization error called mis-hybridization. One expresses the binding intensity 
between two sequences as the number of Watson-Crick complementary base 
pairs [ l/Klfj . and the other is based on a thermodynamics model [EJ- The 
first strategy is represented by the “H-measure” function to take the minimum 
Hamming distance obtained by successively shifting and lining up the WC-com- 
plement of one sequence against another |0. However, these strategies do not 
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prevent the formation of a secondary structure such as a bulge loop or a pseudo- 
knot. Understanding how a secondary structure forms is more important in the 
case where long sequences are created from several short sequences. The lat- 
ter strategy calculates the intensity of a secondary-structure formation based 
on free energy. Andronescu et al. proposed an algorithm to prevent sequences 
from forming a secondary structure m- To estimate this formation they formu- 
lated three functions, which evaluate the free energy of a stacked pair, a hairpin 
loop, and an internal or a bulge loop. Related to the thermodynamics approach, 
predicting the melting temperature (Tm) is useful for constructing the criteria 
(namely, the evaluation function) for a sequence design that excludes sequences 
that may potentially form a secondary structure. As a practical method, we can 
use the nearest-neighbor model (NN-model) to predict the Tm value and the 
thermodynamics parameters, namely, the enthalpy and the entropy of all dimers 
in a sequence WEmm- 

For example, to predict the Tm of duplex S’-AGTG-S’/S’-TCAC-S’ at a con- 
centration of 4 X 10“® M, according to the NN model, the enthalpy and entropy 
of 5’-AGTG-373’-TGAC-5’ are calculated with the following equation: 

5’-AGTG-373’-TGAC-5’ = initiation -b symmetry -b 5’-AG-373’-TC-5’ -b 5’- 
GT-373’-CA-5’ -b 5’-TG-373’-AC-5’ 

Here, the initiation term describes the contribution whether terminal base 
pair is closed by an A-T pair or a G-C pair. The symmetry term describes the 
contribution whether sequence is palindrome or not. 

By using of Hatim’s parameters 0, the enthalpy of 5’-AGTG-373’-TCAG-5’ 
is calculated as (2.3 -b 0.1) -b 0 -b (-7.8) -b (-8.4) -b (-8.5) = -22.3 kcal/mol. 
Identically, the entropy is (4.1 - 2.8) -b 0 -b (-21.0) -b (-22.4) -b (-22.7) = -64.8 
cal/molK. 

Finally, the Tm of a non-palindrome sequence is calculated with the following 
equation 1111113141 ) 1 : 

"" RlnCt/^ + AS ^ ^ 



Therefore, the Tm of 5’-AGTG-373’-TGAG-5’ is approximately 299K(=26°C). 

The NN-model is currently only applicable to a completely complementary 
duplex or to a duplex containing single mismatches or dangling ends. 

In this paper, we focus on the bulge loop structure. Sequence design based on 
the H-measure cannot exclude sequences that form a bulge loop. As mentioned 
above, the NN-model cannot calculate a duplex with a bulge loop because the 
thermodynamics parameters of the bulge loop are still unknown. Although An- 
dronescu et al. proposed a design strategy, which also considers the bulge loop, 
they modeled the binding intensity of the bulge loop according to the length of 
the loop only [|C]J. We investigated the effect of the base type of the loop and the 
loop position (i.e., a loop near the end or in the middle). To accomplish this, we 
measured the Tm of the sequences that had a single bulge by varying the base 
type and the loop position. 
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In addition, we investigated the relation between the Tm and the binding 
intensity in simple experiments. Based on the results, we discuss the importance 
of the Tm in DNA computing. 

Furthermore, the reaction of DNA proceeds autonomously, which is called 
“self-assembly” . For example, by designing DNA sequences adequately, they can 
form a two-dimensional structure called DNA tile by self-assembling. This pro- 
cess is part of the computational capacity of DNA as an evolvable system. Al- 
though DNA computes by self-assembling, an unintended bulge loop may cause 
hybridization errors. Understanding the tendency of bulge-loop formation is use- 
ful for all DNA computing models including evolvable computation. 

2 Measuring the Tm 

2.1 Measuring the Tm with “SYBR Green I” 

We measured the Tm by using the smart cycler system (Cepheid), which uses 
a thermal cycler to control the temperature and to measure the fluorescence 
intensity of a solution containing DNA. The SYBR Green I is a non-sequence 
specific intercalator, which binds with the double strand region and fluoresces. 
As the ratio of double strands in the solution decreases, the fluorescence intensity 
decreases. Therefore, by raising the temperature, the fluorescence intensity can 
be reduced. The fluorescence intensity of SYBR Green I versus temperature 
profiles (melting curves) were acquired by ramping the temperature from 35 to 
95 °G. The Tm is determined as the point where the gradient of the melting 
curve is the steepest. The buffer used for measuring the Tm was 20 mM Tris-HGl 
(pH 7.4), 8 mM MgGlz, 7.5 mM DTT, and 0.05 iigj BSA. 

2.2 Reproducibility of the Smart Cycler System 

The reproducibility of the smart cycler system was examined by measuring the 
Tm between the ligand and the non-mutated analyte (see Table Q. Figure Q 
shows the two curves, one is the frequency distribution curve plotted with 57 
Tm values we measured, and the other is the ideal normal distribution curve 
derived from the average and the standard deviation of the measured Tm. The 
distribution of the measured Tm approximately follows the ideal normal distri- 
bution curve, though more experimental data is needed. Because the average is 
71.0 °G and the standard deviation 0.44 °G, the Tm of a sequence that is more 
than 72.32(71.0-|-3-0.44) °G or less than 69.68(71.0-3-0.44) °G is statistically 
valuable with a 0.997 probability. 

3 Sequence Design 

3.1 Design Strategy 

As a subject for analysis, we focus on a sequence pair, whose one sequence 
complements the other. We call one sequence the “ligand^^ and the other the 
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Fig. 1. Reproducibility of the smart cycler system 



analyte" . Then, we designed sequences with a single bulge loop by changing the 
analyte. We tested the effect of sequences with a single bulge loop by comparing 
the binding of the ligand to the mutated analytes that have a single bulge loop 
with that of the ligand to a non-mutated analyte. 

To measure the Tm of the sequences accurately, we need to design sequences 
that demonstrate an “all-or-none” two-state transition (i.e., random coil < — >■ 
double helix) For example, self-complementary sequences should be 

modeled on a three-state transition model instead of on a two-state one because 
self-complementary sequences have the potential to form both a double helix 
and a hairpin structure. To obtain this property, sequences must be designed 
to minimize the possibility of forming shift hybridization duplexes or stable 
alternative secondary structures. 

We evaluated the degree of likelihood of a shift hybridization forming between 
the ligand and the analyte by using the following formula based on H-measure |0. 



f Shift'- max {n- H{x,a (y))} 

—n<k<0,0<k<n 



( 2 ) 



where n is the sequence length (=20), x denotes the ligand and y the analyte, 
H{*, *) denotes the Hamming distance, y denotes the complementary sequence 
of y, denotes the right (left) shift in case of fc > 0 (fc < 0), respectively, and 
k denotes the number of shifts. Note that in the case of fc = 0, f shift = 20 
because an analyte is a completely complementary sequence of a ligand. As the 
value of this formula is large, the possibility of a shift hybridization forming is 
large based on the hypothesis that the binding intensity of a hybridization can 
be approximated by the number of complementary base pairs. 

Similarly, the possibility of an alternative secondary structure forming is 
evaluated by the following formula obtained by modifying eq ( 0 . 

fseif ■= max {n- H{x,a’"{x^))} (3) 

— n<.k<.n 



where x^ denotes the reverse sequence of x and all remaining variables are the 
same as that of eq ( 0 . 
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A consecutive base, for example ’GGG’, tends to facilitate secondary struc- 
ture formations such as hairpin loops and pseudo knots. We thus prefer to avoid 
sequences containing a consecutive base. 

In addition, in DNA computing research, most sequences are designed to have 
a 50% GG content to prevent secondary structures forming. We also imposed 
this constraint to the ligand. 

Finally, the ligand has the following features: 

1- fshift = 6 

2 - fseif = 8 

3. no consecutive base 

4. 50% GG content 



The analytes with a single bulge were obtained by deleting a 3’-end base 
and inserting a new base into various positions in the analytes. We decided not 
to the base, which was located in the loop position, in the consecutive base 
in sequence, because it may have changed the loop position (Table Q. For ex- 



ample, 3’-GATGATATGCGTCTGCGATG-5’/5’-CTACTATACGCA[^GACGCTA- 
3’ ( G is located in the loop position) could form an alternative bulge loop, 3’- 
GATGATATGCGTCTGGGATG-5’/5 ’-CTACTATACGCAg[^ACGCTA- 3’. There- 



fore, to investigate the effect of the position of a single bulge exactly, we decided 
to insert the base as described above. 



3.2 Evidence of Forming a Bulge Loop 

Do the ligand and the mutated analyte with a single bulge really form a bulge 
loop structure? In other words, can they form the base pairs on one side sequence 
of a loop position and dangle on the other side sequence? To verify this, we com- 
pared the Tm between the ligand (3’-GATGATATGCGTCTGCGATG-5’) and mu- 
tated analytes that had an internal single bulge (e.g. 5’-CTACTATACG | T | cAGA 
CGCTA-3’) with that between the ligand and “half-complementary” analytes (5’- 
CTACTATACG TCTCGCGAGT- 3’ or 5’- TGTAATGCGA CAGACGCTAC-3’), 
which are complementary to the ligand located halfway from the 5’ or 3’-end of 
the sequence, respectively (see Table P. 

The Tm between 3’-GATGATATGCGTCTGCGATG-5’ and 5’-CTACTATAG 
Gpf]cAGACGGTA-3’ is 62.0 °C, while that between 3’-GATGATATGCGTCTGCG 
ATG-5’ and 5’-GTAGTATAGG TCTCGGGAGT- 3’ is 38.2 °G and that between 
3’-GATGATATGCGTCTGCGATG-5’ and 5’- TGTAATGCGA CAGACG CTAC-3’ 
is 52.5 °G. These results show an interaction between a ligand and a mutated 
analyte with an internal single bulge on both sides of the loop position in the 
sequence. Therefore, we confirmed that at least one bulge loop is formed in the 
middle. 



3.3 Duplex Stability of DNA with a Single Bulge 

To investigate the duplex stability of DNA with a single bulge, we compared 
the Tm between the ligand and the mutated analyte with a single bulge to 
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Table 1. Designed Sequences: The underlined base is the mismatched base pair, while 
the boxed base is the bulge loop. fThese sequences were used to investigate the binding 
intensity of sequences with a single bulge by varying the loop position. JThese sequences 
may form hairpin structures. 



Name 



Sequence 



Tm with 



Ligand 

Analyte 



mutated analyte 
with a single mismatch 



3’-GATGATATGCGTCTGCGATG-5’ 

5’-CTACTATACGCAGACGCTAC-3’ 

5’-ATAGTATACGGAGACGCTAC-3’ 

5’-TTAGTATAGGGAGAGGGTAG-3’ 

5’-GTAGTATAGGGAGAGGGTAG-3’ 

5’-GTAGAATAGGGAGAGGGTAG-3’ 

5’-CTACCATAGGGAGACGGTAG-3’ 

5’-CTACGATACGCAGAGGCTAC-3’ 

5’-GTAGTATAGAGAGACGGTAG-3’ 

5’-GTAGTATAGTCAGAGGGTAG-3’ 

5’-CTACTATACCCAGACGCTAC-3’ 

5’-GTAGTATAGGGAGAGAGTAG-3’ 

5’-GTAGTATAGGGAGACTGTAG-3’ 

5’-CTACTATACGCAGAGCCTAC-3’ 

5’-GTAGTATAGGGAGACGGTAA-3’ 

5’-CTACTATACGCAGACGCTAT-3’ 

5’-CTACTATACGCAGACGCTAG-3’ 



the ligand (°G) 



71.0 
68.6 

69.7 

68.8 
65.5 

65.8 

65.2 

60.9 

58.2 

59.4 
62.7 

61.3 

63.9 

68.3 

69.1 

69.5 



mutated analyte 
with a single bulge 



Half- 

complementary 



5’-GT C ACTATACGCAGAGGCTA-3’ f 



5’-CT G AGTATAGGGAGAGGGTA-3’ 



5’-GTAG A TATACGCAGAGGCTA-3’ 



5’-CTAC G TATACGCAGACGCTA-3’ f 



5’-GTAGTA C TACGCAGAGGCTA-3’ f 



5’-CTACTA G TAGGGAGAGGGTA-3’ 



5’-CTACTATA T GGCAGAGGGTA-3’ 



5’-GTAGTATA G GGGAGAGGGTA-3’ f t 



5’-GTAGTATAGG A CAGAGGCTA-3’ 



5’-CTACTATACG T CAGACGCTA-3’ f 



5’-CTACTATAGGCA T GAGGGTA-3 



5’-GTAGTATACGGA C GACGCTA-3’ f 



5’-CTACTATAGGCAGA T GGGTA-3 



5’-GTAGTATAGGGAGA G GGGTA-3’ f t 



5’-GTAGTATAGGGAGAGG A CTA-3’ 



5’-CTACTATACGCAGACG T CTA-3’ f 



5’-GTAGTATACGGAGACGGT C A-3’ f 



5’-CTACTATAGGCAGAGGGT G A-3’ 



5’-CTACTATAGGTCTCGCGAGT-3’ 



5’-TGTAATGCGAGAGACGGTAG-3’ 



68.7 

67.0 

64.4 
66.2 

65.7 

64.3 

64.0 

66.3 

63.5 

62.0 

65.3 
63.9 

61.7 

59.5 

64.7 

64.3 

69.5 

69.7 
38.2 

52.5 
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that between the ligand and the mutated analyte with a single mismatch (Table 
p. For example, the Tm of a middle bulge 5’-CTACTATACGpr]CAGACGCTA-3’ 
is 62.0 °C, while that of a middle mismatch 5’-GTAGTATAGTGAGAGGGTAG- 
3’ is 58.2 °C. In terms of the stability defect in the middle of the sequence, 
a single bulge loop is more stable than a single mismatch. This means that 
sequence design strategies, which overlook the formation of an undesired bulge 
loop structure, lead to serious errors in the hybridization process. 

In terms of the stability defect near the end of a sequence, more experimental 
data are needed. 



4 Experiments 

4.1 The Tm of Sequences with a Single Bulge 

We investigated the effect of the base type and the position of the loop by 
measuring the Tm. The results are shown in Figure P The Tm differed 2.3 ° 
C maximum with varying base type and 9.0 °C with varying loop position. 
Therefore, the effect of the loop position is greater than that of the base type. 
In addition, the distribution of the effect of the base type is the same level in 
relation to the loop position. 




Fig. 2. Tm of sequences with a single 
bulge with varying base types and po- 
sitions of the loop. Note that loop po- 
sitions 0 and 20 are completely com- 
plementary duplexes (not bulge loop 
structures) . 




Fig. 3. Fluorescence intensity of se- 
quences with a single bulge with vary- 
ing loop positions. Note that loop po- 
sition 0 and 20 are completely com- 
plementary duplexes (not bulge loop 
structures). 



4.2 The Binding Intensity of the Sequence with a Single Bulge 

We investigated the binding intensity of a sequence with a single bulge by varying 
the loop position and applying polyacrylamide gel electrophoresis. 

First, we mixed the ligand, the non-mutated analyte, and each mutated an- 
alyte with a single bulge by varying loop positions labeled with a fluorescence 
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label (FITC) at 5’-end into the solution at a concentration ratio of 1:1:1. In other 
words, each mutated analyte with a single bulge bound to the ligand competed 
with the non-mutated analyte. The mutated analytes used for this experiment 
were selected by arbitrarily picking out one sequence per loop position, so we 
had nine sequences in total (see Table 0. The buffer was 20 mM Tris-HCl (pH 
7.4), 8 mM MgCl 2 , 7.5 mM DTT, and 0.05 ^ig/^A BSA. 

Second, the mixture was heated to denature the DNA and make it single- 
stranded. Then, the temperature was lowered to either 2.0, 0.8, or 0.1 °C/s. 

Third, each mixture was analyzed on a 10% polyacrylamide gel by measuring 
the fluorescence intensity of the FITC (Fig. 0. Note that only the mutated an- 
alytes were labeled with the FITC. Therefore, the fluorescence intensity showed 
the amount of mutated analytes with a single bulge. 

Each mixture was separated into a double-strand (upper band) and a single- 
strand (lower band) by the gel electrophoresis. The thickness of the upper band 
corresponded to the amount of DNA forming a bulge loop structure (i.e., the 
duplex between the ligand and the mutated analytes). On the other hand, the 
thickness of the lower band corresponded to the amount of mutated analytes 
that could not form a duplex. Therefore, the ratio of the fluorescence intensity 
of the upper band in each lane represented the binding intensity of each bulge 
loop structure. 

A plot of the ratio of the fluorescence intensity of the upper band vs. the loop 
position is shown in Fig. 0 Roughly speaking, the effect of decreasing the Tm 
of the bulge loop in a middle position is greater than that of a bulge loop near 
the end. We did not observe any particular effect of the temperature gradient 
during the hybridization process. This indicates that the hybridization time may 
be reduced. 




3 5 7 9 II 13 15 17 19 



Fig. 4. Gel electrophoresis on a 10% polyacrylamide gel. The upper band is a double- 
strand DNA, and the lower band is a single-strand DNA with an FITC. The lanes 
can be divided every three lanes by varying the temperature gradient in hybridiza- 
tion. The number in this figure corresponds to the position of the bulge loop. The 
temperature gradient during the hybridization is 2.0 °C/s, 0.8 °C/s and 0.1 °C/s 
from left to right in each group. The lower bands of lanes 9 and 15 are lower than 
others. This suggests the possibility that 5’-CTACTATA ^^ CGCAGACGCTA-3’ and 
5 ’-CTACTATA [~G~| cGCAGACGCTA- 3’ form a hairpin structure. 
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5 Discussion 

The experimental results confirmed that the Tm correlates with and seems to 
be a good indicator for estimating the binding intensity (correlation coefficient 
= 0.82). Both the base type of the ioop and the ioop position influence the 
destabiiization effect of the buige ioop. Therefore, at feast two factors shouid be 
considered when modeiing the formation of the buige ioop structure. Because 
the distribution of the effect of the base type was the same ievei in reiation to 
the ioop position in this experiment, we couid consider the effect of the base 
type and the position of the ioop independentiy assuming that the two factors 
do not influence each other. In the future, we wiii modei the binding intensity of 
the buige ioop structure by appiying the NN-modei and incorporating the effect 
of the ioop position. 

In Figure 0 the lower bands (containing a singie-strand) of fanes 9 and 15 
were lower than the others. This implies the formation of an intramolecular 
structure, such as a hairpin. However, the Tm of the bulge loop, which was 
positioned at the 9-th base from the 5’-end was not extremely low, while the Tm 
of the bulge loop positioned at the 15-th base from the 5 ’-end was extremely low 
(see Table p. Considering these results, we hypothesized that the sequence in 
lanes 9 formed an intramolecular structure that was different from the structure 
formed by the sequence in lanes 15 and that only that sequence formed the 
destabilizing structure. 

The constitutive question in this study is whether the experimental results 
are sequence-independent or not. To answer this, we performed the following 
experiments. We again tested the relationship between the loop position and the 
Tm with (l)the same analyte and mutated ligand (not analyte) with a single 
bulge or (2)a different ligand and analyte set (called ‘'digand2''’ and analyte2'\ 
expediently). If the above experimental results are sequence-independent, the 
results of these experiments should show a similar trend. Figure 0 shows the 
results of testing the same analyte and mutated ligand with a single bulge and 
figure 0 shows those of the Ugand2 and analyte2 set. Both results indicate that 
the bulge loop near the end is stabilized, while the one located in the middle 
is relatively destabilized. The correlation coefficient is 0.93 between the results 
of the ligand and the mutated analyte with a single bulge and those of the 
analyte and mutated ligand with a single bulge. The correlation coefficient is 
0.84 between the results of the ligand and the mutated analyte with a single 
bulge and those of Ugand2 and mutated analyte2 with a single bulge, while 
it is 0.89 between the results of the analyte and the mutated ligand with a 
single bulge and those of Ugand2 and mutated analyte2 with a single bulge. 
These results prove that the relation between the loop position and the Tm is 
sequence-independent . 

6 Conclusion 

In this paper, we examined the Tm and the binding intensity of sequences with a 
single bulge loop. Based on these data, we discussed some factors of the formation 
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Fig. 5. Tm of sequences with a sin- 
gle bulge with varying base types and 
positions of the loop in ligand. Note 
that loop positions 0 and 20 are com- 
pletely complementary duplexes (not 
bulge loop structures). 



Fig. 6. Tm of Ugand2 and mutated 
analyte2 with a single bulge with vary- 
ing base types and positions of the 
loop. Note that loop positions 0 and 20 
are completely complementary duplexes 
(not bulge loop structures). 



of bulge loops in hybridization processes. The Tm correlated with the binding 
intensity with a correlation coefficient of 0.82. The base type of the loop and the 
loop position influenced the Tm and the binding intensity. Roughly speaking, 
the bulge loop near the end was stabilized, while the bulge loop located in the 
middle was relatively destabilized. The Tm measurements suggested that the 
effect of the base type of the loop was not influenced by the loop position and 
vice versa. This result implied that the binding intensity of the bulge loop could 
be properly modeled by considering the effects of the base type and the position 
of the loop independently. The modeling enables estimating the binding inten- 
sity of the bulge loop. Though this study revealed the qualitative relationship 
between the loop position and the binding intensity, more detailed experiments 
will enable estimating the relationship quantitatively. The quantitative data of 
the destabilized effect of a loop position can be used for the evaluation function 
of the sequence design. DNA sequences generated with this function will enable 
reducing hybridization errors in DNA computing. 
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Abstract. This paper introduces a quantum computing algorithm called “QNN” 
(Quantum Neural Networks) which measures quantum mechanically and simul- 
taneously the fitness values of all 2” possible chromosomes of N hits used to 
specify the structure of the networks. Previous attempts to apply quantum com- 
puting algorithms to evolutionary systems (e.g [1-4]) applied classical comput- 
ing evolutionary algorithms to the choice and sequence of quantum operators, 
which is a hybrid approach (i.e. the EAs were classical, and the applications 
were quantum mechanical). Our QNN algorithm, on the other hand, is fully 
quantum mechanical, in the sense that the fitnesses are calculated quantum me- 
chanically as well, thus allowing the fitness values of all possible 2” chromo- 
somes to be measured simultaneously. Evolutionary algorithms (EAs) are a 
form of sampled search in a huge search space (of 2"^ points). If N is large, then 
2” is astronomically large and computationally intractable. The QNN algorithm 
thus undermines the implicit basic assumption applicable to the field of evolu- 
tionary systems (ES), namely that one must employ a sampled search approach 
(i.e. an evolutionary algorithm (EA)) to explore the huge search space. The 
QNN algorithm is a form of what is called in this paper a “total search” algo- 
rithm. The whole space is searched and is done simultaneously, which makes 
the adjective “evolutionary” in the term “evolutionary systems” redundant. One 
can speculate that as the number of qubits implemented in real systems in- 
creases (currently the state of the art is 7 [5]), then it is likely that the current 
emphasis on evolutionary approaches to optimization problems and complex 
system building, will fade away and be replaced by the “total search” approach 
allowed by quantum computational methods. 

Keywords: Quantum Algorithm, Neural Networks, Evolutionary Computation, 
Total versus Sampled Search, Quantum System Building 



1 Introduction 

As stated in the abstract above, the primary aim of this paper is to show that it is pos- 
sible to use a quantum algorithm (that we call “QNN”) to measure simultaneously the 
fitness values of all 2” chromosomes in the population of an evolutionary algorithm 
(EA), hence potentially revolutionizing the field of evolutionary computation (EC) 
and hence those fields that rely upon EC, such as evolutionary system (ES) building. 
This quantum algorithm shows that the “evolutionary” element in EC is no longer 
needed. EAs are a form of “sampled search”, whereas QNN is a form of “total 
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search”, i.e. testing every possible 2^^ points in the search space and doing this simul- 
taneously, using the quantum computational phenomenon of superposition. 

This paper consists of the following sections. Section 1 is this introduction. Sec- 
tion 2 gives some brief background to the classical approach to neural network evolu- 
tion. Section 3, the largest, presents the QNN algorithm is reasonable detail. Section 4 
discusses some ideas for future work, and section 5 summarizes briefly. 



2 Classical Evolution of Neural Networks 

This section gives a brief overview of classical neural networks and how our group 
has evolved them in the past. The basic ideas of this classical approach to neural net- 
work evolution are carried over to the quantum approach in the QNN algorithm. 

Our group’s classical approach employs fully connected neural networks of N neu- 
rons (hence N^ connections), each of which is “weighted” by an M-bit binary fraction. 
These weights are evolved using a Genetic Algorithm (GA), to generate desired sig- 
naling behaviors for given external inputs. The signal values are reals that range be- 
tween H-1.0 and -1.0. An incoming signal vector (to each neuron) is scalar-producted 
with its weight vector, and an external signal value is added to this product. The sum 
is applied to a sigmoid function whose output is the output signal of the neuron that is 
transmitted to all other neurons. The “fitness” value is defined to be the measure of 
how closely the neural net output follows the given target. It is these fitness values 
that drive the evolutionary process. 



3 The QNN Algorithm 

It is assumed in this paper that readers are already familiar with the basic principles of 
quantum computing. If not, then interested readers may consult the references [8-11]. 
This section describes a quantum-computing-based algorithm called QNN that gener- 
ates neural networks, which will serve as a model (or proof of concept) for future 
work. Our group aims to develop such quantum algorithms further, and to combine 
quantum generated neural circuit modules into quantum artificial brains. (Note: The 
first author’s primary research is in building artificial brains). By a “quantum artificial 
brain” is meant that its components can be generated using future quantum based al- 
gorithms such as QNN, and that the resulting brain can be run using quantum me- 
chanical principles. The construction of a quantum artificial brain would be somewhat 
analogous to the construction of a classical artificial brain except that the former ap- 
proach would use quantum computing and the latter would use classical computing. 



3.1 Basic Ideas 

Considering Fig. 1, 

a) Wll, W22, W12, W21 are the (binary) weights of the specific links in the direc- 
tions indicated by the arrows. Wll, W22, W12, W21 = {0,1}. 

b) NSl and NS2 are the output signals of the neurons. NSl, NS2 = {0,1 }. At the O'" 
clock tick, NSl and NS2 are 0. For all further clock ticks, the outputs of the neu- 
rons are NSl and NS2 (for neuron 1 and 2 respectively). 
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c) EXS is the external signal given to each neuron. EXS = { 1 } 

d) The desired (or target) behavior of this extremely simple neural net circuit is to 
output the minimum number of zeros (i.e. the maximum number of I’s) for all 
clock ticks. 




W12 



Fig. 1. A Simple 2 Neuron Network 



3.2 Description of the Qnet Circuit 

Here we describe the Qnet Circuit (see Eig. 2), operator by operator, for one clock 
tick. 

Step 1) Initialization of the States 

In Eig. 2 the left most boxes represent the qubits of the Qnet Circuit. 

i) The first four boxes Wll, W22, W21, W12 represent the weights of the neural net 
that are prepared in a superposed state using Hadamard gates. These superim- 
posed weights represent the whole search space of all possible chromosomes in 
genetic algorithm terms. 

ii) The fifth and sixth boxes NSl, NS2 represent the signals output by the two neu- 
rons from the previous clock tick. Initially NSl and NS2 are I0>. 

iii) The seventh box EXS represents the external signal given to each neuron initially 
to start the signaling of the neural net. 

iv) The remaining boxes represent the qubits in the pure state as indicated by each 
box, i.e. either I0> or ll>. 

Step 2) Signaling through the Qnet Circuit 

Operators 1-4; The first step in implementing the neural net is to calculate the scalar 
(or dot) product of the weights and the signals using the equations 

(W11*NS1 +W21*NS2) (1) 

(W22*NS2 + W12*NS1) (2) 

We applied four Controlled Controlled Not Gates (CCNot), and fixed the target bit 
at I0>. The circuit will then behave like an AND gate (i.e., (a,b,0) => (a,b,a&b)). We 
used an AND gate to implement the scalar product because we have binary weights 
and signals (which can be either 0 or 1). 

Operators 5-8: After calculating the scalar product, we add the external signal EXS 
to equation (1) above to get NSl for the next clock tick. These operators when com- 
bined, perform the addition of the scalar product and the external signal. They take 
the external signal as the carry in and output the sum of the three terms and the carry. 
This quantum addition is based on the work of Vedral et al [4]. 
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Fig. 2. The 2-Neuron Qnet Circuit 



Operators 9-12: These operators do for NS2 and EXS what operators 5-8 do for NSl 
andEXS. 

Step 3) Threshold Calculation 

After computing the output signal for each neuron, we decide next whether the neuron 
will fire or not by comparing the output signals we calculated from equations (1) & 
(2) (h- EXS) with some threshold value. By threshold value we mean that if the output 
signal value (two bits) of the neuron is greater than or equal to the threshold value, the 
neuron outputs a 1, otherwise a 0, that will be input to the circuit at the next clock 
tick. We take the threshold value to be 1 because then our input signal has only one 
bit (after obtaining the two signals we ORed them and accordingly we passed a 1 or a 
0 ). 

Operators 13-18: These operators when combined, perform the ORing of the two 
signal bits. We built an OR gate with the help of NAND gates and when we analyzed 
the Toffoli gate, we found that if we keep the previous target bit at 1 we get results 
similar to a NAND gate. To build an OR gate we used de Morgan’s law, namely (A’* 
B’)’ = A H- B, i.e. 2 NOT gates and a NAND gate. 

Step 4) Fitness Calculation 

Here the fitness is defined to be the minimum number of O’s output, for all the clock 
ticks. After comparing the signal with the threshold we need to store the output sig- 
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nals (i.e. the NSl output signals of the neural net) to perform the fitness calculation 
afterwards, as well as pass them as neuron signals for the next clock tick. As the out- 
put signal of the network, only one output neuron signal is chosen (for all clock ticks). 
This raises the issue of copying signal values, which is not allowed according to the 
“No-Cloning Theorem”. (Since our signals are in superimposed states, there is no way 
to store them). We therefore place a composite adder in Fig. 2 which gives not only 
the sum of the signal outputs from all previous clock ticks, but also outputs the same 
signal that was input to it. This signal is the neuron output signal for the next clock 
tick. The “composite adder” consists of 6 half adders, as shown in Fig. 3. 

Fig. 3 shows the internal circuit of the composite adder of Fig. 2. In this circuit the 
first block of six half adders represents the composite adder at clock T and the second 
block of six half adders represents the composite adder at clock Th- 1. Fig. 3 shows 
how incoming information is both passed through the circuit and is added to the fit- 
ness sum. Six half adders are taken because we use 64 clock ticks. When we add 64 
bits serially we get a 6 bit binary number (2'’ = 64). In Fig. 3 the first half adder on 
the left side will be given the output signal NSl of a neuron as an input from clock T- 
1 . 

Signals SO to S5 are the sum bits from clock T. The input NS 1 is passed through a 
NOT gate so that if we get a 1 as output we give a 0 as input. We thus add the number 
of zero’s in the output signals, to find the fittest chromosome (with lowest fitness 
value). All the other inputs are qubits prepared in the pure state I0>. The output of this 
composite adder at clock T is given as input to the composite adder at clock Th- 1. We 
obtain some unused bits that we label GB (garbage bits). We repeat this process for 
64 clock ticks, thus obtaining the sum of the output signals for each chromosome. We 
can then apply “Durr’s Algorithm” [12] to find the index position of the minimum 
value in a given set. This index maps to the index of the weights. We can therefore 
find the weights of the fittest chromosome and use the corresponding neural network 
to perform the desired function. 

Step 5) Recycling of the Qubits 

The operators 19 - 22 in Fig. 2 and 23 - 34 in figure 4 are used to reverse the opera- 
tions that were performed earlier, by running them in reverse order. This saves cir- 
cuitry by allowing us to use the same wires. 



3.3 Extending the QNN Algorithm to a 3-Neuron Neural Network 

The circuits for a 3-neuron neural network differ essentially only in the calculation of 
the fitness. Here we present the fitness function circuit for 3-neuron neural network. 
Extending manually such circuits to a greater number of neurons (>3) is tedious but 
doable. (See the idea in the next section for a quantum compiler to perform such tedi- 
ous tasks). 

The remaining circuits for the 3-neuron Qnet circuit can be developed easily, based 
on the 2-neuron Qnet, as shown above. The 3-neuron Qnet obviously requires more 
qubit lines and quantum operators. 
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Fig. 4. A Circuit for Recycling the Qubits 
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Fig. 5. The Fitness Function Circuit for a 3-Neuron Neural Network 



4 Future Work 

This section introduces some ideas for future work on quantum algorithms as applied 
to the generation of neural networks, quantum artificial hrains, etc. 

a) More Complex Algorithms 

Our group would like to devise more elaborate QNN-like algorithms for less toy-like 
neural net evolution applications. In our classical neural net evolution work, we 
evolve neural net modules that control motions, detect patterns, make decisions, etc., 
and use them as components towards building artificial brains [3]. We aim to devise 
similar algorithms using quantum techniques, as far as this is possible. 

b) Quantum Multi-Module Systems & Quantum Artificial Brains 

If one can use a QNN-like algorithm successfully to generate a quantum neural circuit 
module with a given function, then this can probably be done for many different func- 
tions. We intend studying “quantum multi-modules” in order to build larger systems 
such as “quantum artificial brains”. 

c) A “Quantum Compiler” 

Figs. 2-5 show clearly that the Qnets corresponding to quantum neural network cir- 
cuits that have a large number of neurons are tedious to design by hand. What is 
needed is a “Qnet Compiler” or “Quantum Compiler” to generate Qnets automati- 
cally. A quantum compiler would accept high-level English-like statements written in 
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a special “Quantum Neural Network (QNN) Language” (call it “qC”) and translate 

them into Qnets. Such quantum compilers would also need several additional tools 

and features, such as the following - 

1. Qubit Initialization Tool: As in ordinary quantum computing, we will need to 
prepare input qubits in superimposed states (in order to take advantage of quantum 
computing’s massive parallelism due to state superposition). 

2. Parallelism: Quantum operations on the superimposed qubits are performed in 
parallel (in Hilbert Space). This feature would need to be included in the new lan- 
guage “qC”. 

3. Loop Restrictions: Loops are not allowed in quantum computing. A Qnet is there- 
fore required to be run sequentially. A quantum language such as “qC” would not 
allow any kind of For or While loops. 

4. No Fan-out and Fan-in: Fan-out and fan-in are not allowed in quantum comput- 
ing, hence If Then Else statements would similarly not be allowed in the new qC 
language either. 

5. No Copying: Since an exact copy of a superimposed state is not allowed (accord- 
ing to the “No Cloning Theorem” of quantum computing), no intermediate calcula- 
tion at a particular iteration will be allowed. All such calculations should be per- 
formed at the end. 

6. Only One Result: Any algorithm written in qC should produce only one result. 
We know from quantum computing principles that as soon as we perform a meas- 
urement, we destroy the superimposed state. We should therefore save all informa- 
tion until we perform the measurement at the last moment. 



5 Brief Summary 

We believe that the QNN quantum algorithm above shows that it is possible to per- 
form the fitness measurements of all possible 2” points in the search space simultane- 
ously, thus making traditional evolutionary algorithms (EAs) rather redundant. Since 
evolutionary systems (ESs) use EAs as their basic tools, ESs too will be affected. 
Admittedly the QNN algorithm treats only very primitive neural networks, but with 
tools such as quantum compilers to aid the creation of Qnets of great complexity and 
sophistication, elaborate and powerful neural networks could be generated (rather 
than evolved) that could be used as modular components in more complex multi- 
module systems, such as quantum artificial brains. 
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